Notice

Recent Posts

Recent Comments

Link

도개진 Git

« 2024/09 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Archives

Today

Total

관리 메뉴

도찐개찐

[데이터시각화] 09. 교차표 본문

PYTHON/데이터분석

[데이터시각화] 09. 교차표

도개진 2023. 1. 2. 13:04

교차표

범주형 데이터 분석시 사용하는 분석도구
crosstab(인덱스, 컬럼, 행이름, 컬럼이름, 총합여부, 정규화여부)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = pd.DataFrame({'id': ['id1', 'id1', 'id1', 'id2', 'id2', 'id3'],
'col1': ['a', 'a', 'a', 'b', 'b', 'b'],
'col2': ['d', 'd', 'd', 'c', 'c', 'd']})
print(data.id)

0    id1
1    id1
2    id1
3    id2
4    id2
5    id3
Name: id, dtype: object

# 교차표 생성 1 : 인덱스와 열 지정
# id를 기준으로 col1/col2 빈도 조사
pd.crosstab(data.id, data.col1)

col1	a	b
id
id1	3	0
id2	0	2
id3	0	1

# 교차표에 총합 여부 표시
pd.crosstab(data.id, data.col1, margins=True)

col1	a	b	All
id
id1	3	0	3
id2	0	2	2
id3	0	1	1
All	3	3	6

# 교차표에 비율 여부 표시
pd.crosstab(data.id, data.col1, normalize=True)

col1	a	b
id
id1	0.5	0.000000
id2	0.0	0.333333
id3	0.0	0.166667

titanic 데이터에 대한 교차표 작성

titanic = sns.load_dataset('titanic')

성별 생존 여부

titanic.info()
pd.crosstab(titanic.sex, titanic.survived)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB

survived	0	1
sex
female	81	233
male	468	109

성별 승선 위치별 생존 여부

# titanic.info()
cb = pd.crosstab(titanic.alive, [titanic.sex, titanic.embarked])
cb.plot.bar(rot=0)
plt.show()

성별 좌석 등급별 생존 여부

cb = pd.crosstab(titanic.alive, [titanic.sex, titanic['class']])
cb.plot.bar(rot=0)

<AxesSubplot:xlabel='alive'>

교통사고 데이터에 대한 교차표 작성

car = pd.read_csv('../data/car_accient2016.csv')
car = car.iloc[:, [3, 4, 10, 12]]

car.발생지시도.value_counts()

경기    749
경북    477
충남    372
경남    359
서울    342
전남    327
전북    275
충북    221
강원    206
부산    153
대구    150
인천    137
대전     94
광주     85
제주     77
울산     70
세종     25
Name: 발생지시도, dtype: int64

시도별 교통사고 건수

시도 요일별 교통사고 건수

import matplotlib as mpl

fontpath = '/home/bigdata/py39/lib/python3.9/site-packages/matplotlib/mpl-data/fonts/ttf/NanumGothic.ttf'
fname = mpl.font_manager.FontProperties(fname=fontpath).get_name()

mpl.rcParams['font.family'] = 'NanumGothic'
mpl.rcParams['font.size'] = 12
mpl.rcParams['axes.unicode_minus'] = False

cb = pd.crosstab(car.발생지시도, car.요일).head(5)
cb.plot.bar(rot=0)

<AxesSubplot:xlabel='발생지시도'>

시도별 유형별 교통사고 건수

cb = pd.crosstab(car.발생지시도, car.사고유형_대분류).head(5)
cb.plot.bar(rot=0)

<AxesSubplot:xlabel='발생지시도'>

728x90

저작자표시

'PYTHON > 데이터분석' 카테고리의 다른 글

[데이터분석] 11. 상관분석 (0)	2023.01.02
[데이터시각화] 10. 다중그래프 (0)	2023.01.02
[데이터분석] 08. 박스플롯 (0)	2023.01.02
[데이터시각화] 07. 산점도 (0)	2023.01.02
[데이터시각화] 06. 선그래프 (0)	2023.01.02