Data Analysis with Dask

4.5. Data Analysis with Dask#

In this section, we will introduce two data analysis cases based on Dask.

import os

import matplotlib.pyplot as plt
%matplotlib inline
import dask.dataframe as dd
from dask.distributed import LocalCluster, Client
import pandas as pd

pd.options.mode.chained_assignment = None
# create a `LocalCluster` and connect
cluster = LocalCluster()
client = Client(cluster)

Example: Citi Bike#

The Citi Bike dataset provides detailed ride records of New York City’s bike-sharing system. The dataset contains fields like ride ID, ride type, start and end times, start and end stations, latitude and longitude. This dataset can be utilized for analysis or visualization.

We start by reading the data. The Citi Bike dataset consists of multiple CSV files, with each file corresponding to a month. Dask can directly read multiple CSV files using the wildcard *.

import sys
sys.path.append("..")
from utils import citi_bike

bike_path = citi_bike()
ddf: dd.DataFrame = dd.read_csv(os.path.join(bike_path, "*.csv"))

ddf.head()
Downloading JC-202301-citibike-tripdata.csv.zip
Downloading JC-202302-citibike-tripdata.csv.zip
Downloading JC-202303-citibike-tripdata.csv.zip
Downloading JC-202304-citibike-tripdata.csv.zip
ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual
0 0905B18B365C9D20 classic_bike 2023-01-28 09:18:10 2023-01-28 09:28:52 Hoboken Terminal - Hudson St & Hudson Pl HB101 Hamilton Park JC009 40.735938 -74.030305 40.727596 -74.044247 member
1 B4F0562B05CB5404 electric_bike 2023-01-23 20:10:12 2023-01-23 20:18:27 Hoboken Terminal - Hudson St & Hudson Pl HB101 Southwest Park - Jackson St & Observer Hwy HB401 40.735938 -74.030305 40.737551 -74.041664 member
2 5ABF032895F5D87E classic_bike 2023-01-29 15:27:04 2023-01-29 15:32:38 Hoboken Terminal - Hudson St & Hudson Pl HB101 Marshall St & 2 St HB408 40.735944 -74.030383 40.740802 -74.042521 member
3 E7E1F9C53976D2F9 classic_bike 2023-01-24 18:35:08 2023-01-24 18:42:13 Hoboken Terminal - Hudson St & Hudson Pl HB101 Hamilton Park JC009 40.735986 -74.030364 40.727596 -74.044247 member
4 323165780CA0734B classic_bike 2023-01-21 20:44:09 2023-01-21 20:48:08 Hamilton Park JC009 Manila & 1st JC082 40.727596 -74.044247 40.721651 -74.042884 member

Next, we do some data preprocessing, which includes data transformation and data cleaning. Here, we use map_partitions() to process the data in each partition in parallel, calculating the average speed based on the latitude and longitude, start time, and end time of each ride.

def preprocess(df: dd.DataFrame):
    df["started_at"] = dd.to_datetime(df["started_at"], errors="coerce")
    df["ended_at"] = dd.to_datetime(df["ended_at"], errors="coerce")

    df["trip_duration"] = (df["ended_at"] - df["started_at"]).dt.total_seconds() / 60
    df = df[df["trip_duration"] > 0]

    df["year"] = df["started_at"].dt.year
    df["month"] = df["started_at"].dt.month
    df["day"] = df["started_at"].dt.day
    df["hour"] = df["started_at"].dt.hour

    # get the speed
    df["distance"] = (
        (df["end_lat"] - df["start_lat"]) ** 2 + (df["end_lng"] - df["start_lng"]) ** 2
    ) ** 0.5
    df["speed"] = df["distance"] / df["trip_duration"]

    df.dropna()
    return df


ddf = ddf.map_partitions(preprocess)
ddf.head()
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual trip_duration year month day hour distance speed
0 0905B18B365C9D20 classic_bike 2023-01-28 09:18:10 2023-01-28 09:28:52 Hoboken Terminal - Hudson St & Hudson Pl HB101 Hamilton Park JC009 40.735938 -74.030305 40.727596 -74.044247 member 10.700000 2023 1 28 9 0.016248 0.001518
1 B4F0562B05CB5404 electric_bike 2023-01-23 20:10:12 2023-01-23 20:18:27 Hoboken Terminal - Hudson St & Hudson Pl HB101 Southwest Park - Jackson St & Observer Hwy HB401 40.735938 -74.030305 40.737551 -74.041664 member 8.250000 2023 1 23 20 0.011473 0.001391
2 5ABF032895F5D87E classic_bike 2023-01-29 15:27:04 2023-01-29 15:32:38 Hoboken Terminal - Hudson St & Hudson Pl HB101 Marshall St & 2 St HB408 40.735944 -74.030383 40.740802 -74.042521 member 5.566667 2023 1 29 15 0.013074 0.002349
3 E7E1F9C53976D2F9 classic_bike 2023-01-24 18:35:08 2023-01-24 18:42:13 Hoboken Terminal - Hudson St & Hudson Pl HB101 Hamilton Park JC009 40.735986 -74.030364 40.727596 -74.044247 member 7.083333 2023 1 24 18 0.016222 0.002290
4 323165780CA0734B classic_bike 2023-01-21 20:44:09 2023-01-21 20:48:08 Hamilton Park JC009 Manila & 1st JC082 40.727596 -74.044247 40.721651 -74.042884 member 3.983333 2023 1 21 20 0.006100 0.001531

Next, we visualize the average speed for different types of rides. Note that, due to Dask’s lazy execution, you need to use compute() to trigger the computation.

avg_speed_by_type = ddf.groupby('rideable_type')['speed'].mean().compute()
avg_speed_by_type.plot(kind='bar', title='Average Speed by Rideable Type')
plt.xlabel('Rideable Type')
plt.ylabel('Average Speed (units/minute)')
plt.show()
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
../_images/03ff397fd0a6c3e669ba4417f2104d1af381cdc43a49c6fe41a77b91a76f5391.png

Next, we use groupby() to group the data and perform operations such as count(), sort_values(), and mean() to process the data.

def process_data(df: dd.DataFrame):
    # total rides that groupby start station
    total_rides_by_start_station = (
        df.groupby("start_station_name")["ride_id"].count().compute()
    )
    print("Total rides groupby start stations:")
    print(total_rides_by_start_station.head())

    # sort
    sorted_trips_by_start_station = total_rides_by_start_station.sort_values(
        ascending=False
    )
    print("\nSorted rides groupby start stations:")
    print(sorted_trips_by_start_station.head())

    # Average duration groupby month and rideable_type
    trip_duration_by_member_and_month = (
        df.groupby(["month", "rideable_type"])["trip_duration"].mean().compute()
    )
    print("\nAverage duration groupby `month` and `rideable_type`:")
    print(trip_duration_by_member_and_month)

    return trip_duration_by_member_and_month


trip_duration_by_member_and_month = process_data(ddf)
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Total rides groupby start stations:
start_station_name
11 St & Washington St                4688
12 St & Sinatra Dr N                 3334
14 St Ferry - 14 St & Shipyard Ln    4100
4 St & Grand St                      2826
5 Corners Library                     938
Name: ride_id, dtype: int64

Sorted rides groupby start stations:
start_station_name
Grove St PATH                                   12649
Hoboken Terminal - River St & Hudson Pl         12151
South Waterfront Walkway - Sinatra Dr & 1 St     8509
Hoboken Terminal - Hudson St & Hudson Pl         7281
City Hall - Washington St & 1 St                 6503
Name: ride_id, dtype: int64

Average duration groupby `month` and `rideable_type`:
month  rideable_type
1      classic_bike      10.560105
       docked_bike      134.876000
       electric_bike     15.283021
2      classic_bike       9.565480
       docked_bike      192.740067
       electric_bike     11.009021
3      classic_bike      10.081960
       docked_bike      105.768713
       electric_bike     12.169278
4      classic_bike      12.316335
       docked_bike      122.971542
       electric_bike     12.398595
Name: trip_duration, dtype: float64
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

下面我们对按月份和骑行类型分组的平均行程时长进行可视化。

def plot_data(df):
    df.unstack().plot(kind="bar", stacked=True)
    plt.title("Average Trip Duration by Month and Rideable Type")
    plt.xlabel("Month")
    plt.ylabel("Average Trip Duration (minutes)")
    plt.legend(title="Rideable Type")
    plt.show()


plot_data(trip_duration_by_member_and_month)
../_images/ffe22cb3af7ffd4b283fbd93ef81fceb8736837de89f65a8b3e4bf34b8f232de.png

We then attempt to add new data, using concat() for data merging and concatenation.

from utils import more_citi_bike

bike_path2 = more_citi_bike()
ddf2: dd.DataFrame = dd.read_csv(os.path.join(bike_path2, "*.csv"))

ddf2 = ddf2.map_partitions(preprocess)

concatenated_ddf = dd.concat([ddf, ddf2], axis=0)

trip_duration_by_member_and_month2 = process_data(concatenated_ddf)
Downloading JC-202310-citibike-tripdata.csv.zip
Downloading JC-202311-citibike-tripdata.csv.zip
Downloading JC-202312-citibike-tripdata.csv.zip
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Total rides groupby start stations:
start_station_name
11 St & Washington St                9190
12 St & Sinatra Dr N                 6934
14 St Ferry - 14 St & Shipyard Ln    7830
4 St & Grand St                      5703
5 Corners Library                    1745
Name: ride_id, dtype: int64

Sorted rides groupby start stations:
start_station_name
Hoboken Terminal - River St & Hudson Pl         23356
Grove St PATH                                   23034
Hoboken Terminal - Hudson St & Hudson Pl        13792
South Waterfront Walkway - Sinatra Dr & 1 St    13727
City Hall - Washington St & 1 St                11791
Name: ride_id, dtype: int64
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Average duration groupby `month` and `rideable_type`:
month  rideable_type
1      classic_bike      10.560105
       docked_bike      134.876000
       electric_bike     15.283021
2      classic_bike       9.565480
       docked_bike      192.740067
       electric_bike     11.009021
3      classic_bike      10.081960
       docked_bike      105.768713
       electric_bike     12.169278
4      classic_bike      12.316335
       docked_bike      122.971542
       electric_bike     12.398595
10     classic_bike      11.750429
       electric_bike     16.112951
11     classic_bike       9.647373
       electric_bike     10.886656
12     classic_bike       9.715068
       electric_bike     10.103621
Name: trip_duration, dtype: float64
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Visualize it:

plot_data(trip_duration_by_member_and_month2)
../_images/5e784b8d228a25fe845fcd4db02229bb560a696ab9067f6f7940d26bfd87bf8a.png

Finally, we attempt to create a pivot table and visualize it.

# convert 'rideable_type' into category
concatenated_ddf["rideable_type"] = concatenated_ddf["rideable_type"].astype("category")
concatenated_ddf = concatenated_ddf.categorize(columns=["rideable_type"])

# pivot table
pivot_table = concatenated_ddf.pivot_table(
    index="start_station_name",
    columns="rideable_type",
    values="trip_duration",
    aggfunc="mean",
).compute()

print("Pivot table:")
print(pivot_table)

# visualize it
pivot_table.plot(kind="bar", figsize=(10, 5))
plt.title("Average Trip Duration by Start Station and Rideable Type")
plt.xlabel("Start Station Name")
plt.ylabel("Average Trip Duration (minutes)")
plt.legend(title="Rideable Type")
plt.tight_layout()
plt.show()
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_12300/3612891295.py:17: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Pivot table:
rideable_type                      classic_bike  docked_bike  electric_bike
start_station_name                                                         
11 Ave & W 27 St                      16.583333          NaN            NaN
11 St & Washington St                  9.993954    20.685897       9.397788
12 St & Sinatra Dr N                  14.230417    38.415351      13.857875
14 St Ferry - 14 St & Shipyard Ln     13.091936   198.055702      15.262149
2 Ave & E 29 St                        7.983333          NaN            NaN
...                                         ...          ...            ...
Warren St                             10.612448    52.636364      11.075977
Washington St                         12.518832   130.303125      11.796525
Washington St & Gansevoort St          3.750000          NaN            NaN
Willow Ave & 12 St                    10.836851    20.166667      11.164558
York St & Marin Blvd                  11.622333    54.966667      25.448760

[161 rows x 3 columns]
../_images/42287b289d754ecda8302c1a8030a2e97c5f80f26924650a3d3400463959aae5.png

Example: Census Income#

The adult dataset (also known as “Census Income” dataset) includes records of demographic information and income levels, including age, type of work, level of education, occupation, race, gender, weekly working hours, and income, etc.

We start by reading the data:

from utils import adult
import seaborn as sns

file_path = adult()
columns = [
    "age",
    "workclass",
    "fnlwgt",
    "education",
    "education-num",
    "marital-status",
    "occupation",
    "relationship",
    "race",
    "sex",
    "capital-gain",
    "capital-loss",
    "hours-per-week",
    "native-country",
    "income",
]
adult_ddf: dd.DataFrame = dd.read_csv(
    os.path.join(file_path, "adult.data"), names=columns, header=None
)

adult_ddf.head()
Downloading adult.zip
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
0 39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K
1 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K
2 38 Private 215646 HS-grad 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K
3 53 Private 234721 11th 7 Married-civ-spouse Handlers-cleaners Husband Black Male 0 0 40 United-States <=50K
4 28 Private 338409 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K

Next, we preprocess the data.

categorical_columns = [
    "workclass",
    "education",
    "marital-status",
    "occupation",
    "relationship",
    "race",
    "sex",
    "native-country",
    "income",
]

numeric_columns = [
    "age",
    "fnlwgt",
    "education-num",
    "capital-gain",
    "capital-loss",
    "hours-per-week",
]


def preprocess(ddf: dd.DataFrame):
    # convert to category
    for col in categorical_columns:
        ddf[col] = ddf[col].astype("category")

    return ddf.dropna()


adult_ddf = adult_ddf.map_partitions(preprocess)

adult_ddf.describe().compute()
age fnlwgt education-num capital-gain capital-loss hours-per-week
count 32561.000000 3.256100e+04 32561.000000 32561.000000 32561.000000 32561.000000
mean 38.581647 1.897784e+05 10.080679 1077.648844 87.303830 40.437456
std 13.640433 1.055500e+05 2.572720 7385.292085 402.960219 12.347429
min 17.000000 1.228500e+04 1.000000 0.000000 0.000000 1.000000
25% 28.000000 1.178270e+05 9.000000 0.000000 0.000000 40.000000
50% 37.000000 1.783560e+05 10.000000 0.000000 0.000000 40.000000
75% 48.000000 2.370510e+05 12.000000 0.000000 0.000000 45.000000
max 90.000000 1.484705e+06 16.000000 99999.000000 4356.000000 99.000000

We draw a scatter plot to explore the relationship between income and other variables.

plt.figure(figsize=(10, 10))
sns.pairplot(
    adult_ddf.compute(),
    vars=numeric_columns,
    hue="income",
    palette="Set1",
    plot_kws={"alpha": 0.5},
)
plt.suptitle("Scatterplot Matrix by Income", y=1.02)
plt.show()
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
<Figure size 1000x1000 with 0 Axes>
../_images/4008386d3a65ce9031edde325f342814e6ae8d6144bbf1d8c4dcac34607d217b.png

The results show that, except for age, the other factors do not significantly differ between incomes of <=50k and >50k. Therefore, we will next explore the relationship between age and income.

We use the groupby() function to investigate the differences in age and income under various levels of education.

# groupby by `education` and `income`, get average age
education_income_age = (
    adult_ddf.groupby(["education", "income"])["age"].mean().reset_index().compute()
)

plt.figure(figsize=(10, 5))
sns.barplot(
    data=education_income_age, x="education", y="age", hue="income", palette="Set1"
)
plt.xticks(rotation=45)
plt.title("Average Age by Education Level and Income")
plt.xlabel("Education Level")
plt.ylabel("Average Age")
plt.legend(title="Income")
plt.tight_layout()
plt.show()
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/dask_expr/_groupby.py:1542: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  self._meta = self.obj._meta.groupby(
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
../_images/e51b8f0b04ad6843af71871a916bdbcba96d2580e1669a8d10a648d1252a2d5d.png

The results show that regardless of the level of education, the average age of people with an income of >=50k tends to be higher, mainly between the ages of 40 to 50.

We wish to explore the relationship between age and income under more specific years of education education_num.

# relationship between `education-num` and `income`
plt.figure(figsize=(10, 5))
sns.boxplot(
    data=adult_ddf.compute(), x="education-num", y="age", hue="income", palette="Set1"
)
plt.title("Age Distribution by Education Number and Income")
plt.xlabel("Years of Education")
plt.ylabel("Age")
plt.legend(title="Income")
plt.tight_layout()
plt.show()

# average age
education_num_income_age = (
    adult_ddf.groupby(["education-num", "income"])["age"].mean().reset_index().compute()
)

plt.figure(figsize=(10, 5))
sns.lineplot(
    data=education_num_income_age,
    x="education-num",
    y="age",
    hue="income",
    marker="o",
    palette="Set1",
)
plt.title("Average Age by Years of Education and Income")
plt.xlabel("Years of Education")
plt.ylabel("Average Age")
plt.legend(title="Income")
plt.tight_layout()
plt.show()
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas.
  positions = grouped.grouper.result_index.to_numpy(dtype=float)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/categorical.py:640: FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas.
  positions = grouped.grouper.result_index.to_numpy(dtype=float)
../_images/5f588506ffd76c2a9d4d18a512aa70f94862cae5a84d5be60ec87fafa6ecd44e.png
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/dask_expr/_groupby.py:1542: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  self._meta = self.obj._meta.groupby(
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
../_images/1a148d025b437904c6b01219b0bdcf643427a36ef24a48a10161635b83050dc1.png

The results show that as the years of education increases, both the median age and mean age of individuals with an income >50K gradually increase, but the difference between the two groups decreases.

Finally, we examine the relationship between different levels of education and average hourly wage. We need to calculate the average hourly wage.

def calc_avg_income(df):
    df["net_capital_gain"] = df["capital-gain"] - adult_ddf["capital-loss"]
    df = df[df["net_capital_gain"] != 0]
    df["hourly_wage"] = df["net_capital_gain"] / df["hours-per-week"]

    return df


adult_ddf = calc_avg_income(adult_ddf)
# average hourly income
education_income_wage = (
    adult_ddf.groupby(["education", "income"])["hourly_wage"]
    .mean()
    .reset_index()
    .compute()
)

plt.figure(figsize=(10, 5))
sns.barplot(
    data=education_income_wage,
    x="education",
    y="hourly_wage",
    hue="income",
    palette="Set1",
)
plt.xticks(rotation=45)
plt.title(
    "Average Hourly Wage by Education Level and Income (Non-zero Net Capital Gain)"
)
plt.xlabel("Education Level")
plt.ylabel("Average Hourly Wage")
plt.legend(title="Income")
plt.tight_layout()
plt.show()
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/dask_expr/_groupby.py:1542: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  self._meta = self.obj._meta.groupby(
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/luweizheng/miniconda3/envs/dispy/lib/python3.11/site-packages/seaborn/_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
../_images/56e13ef746ff2b633334cdf3cf9a9afab7024a298285b143b89131b319aa8639.png

For lower levels of education, individuals with an income >50K have an average hourly wage that is significantly higher than those with an income <=50K. As the level of education increases, the gap in hourly wages between the >50K and <=50K income groups remains apparent, but in higher education levels, this gap is notably reduced. However, individuals with an income >50K still have a higher average hourly wage than those with an income <=50K.

Overall, the impact of education level on income is significant, with clear differences in average hourly wages among individuals of different education levels. Especially for individuals with an income >50K, their average hourly wage increases significantly with higher levels of education.

client.shutdown()