๐Ÿ“š Study

์ด์ƒ์น˜ ์ œ๊ฑฐ(familysize) ๋ช…๋ชฉํ˜• ๋ณ€์ˆ˜ ๋ณ€ํ™˜ ์˜ํ–ฅ ์—†๋Š” ์ปฌ๋Ÿผ ์ œ๊ฑฐ ์ถ”๊ฐ€์ปฌ๋Ÿผ ์ƒ์„ฑ(TIPI, ๋งˆํ‚ค์•„๋ฒจ๋ฆฌ์ฆ˜ ์Šค์ฝ”์–ด) VCL ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ(์œ ํšจํ•˜์ง€ ์•Š์€ ๋‹จ์–ด์— ์‘๋‹ตํ•œ row์‚ญ์ œ) ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ(RandomForestClassifier) 0.7504515876
https://dacon.io/competitions/official/235902/overview/description SW์ค‘์‹ฌ๋Œ€ํ•™ ๊ณต๋™ AI ๊ฒฝ์ง„๋Œ€ํšŒ โฎ์˜ˆ์„ โฏ - DACON ๋ถ„์„์‹œ๊ฐํ™” ๋Œ€ํšŒ ์ฝ”๋“œ ๊ณต์œ  ๊ฒŒ์‹œ๋ฌผ์€ ๋‚ด์šฉ ํ™•์ธ ํ›„ ์ข‹์•„์š”(ํˆฌํ‘œ) ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. dacon.io
1. ๊ณต๋ถ€๋ฐฉ๋ฒ• ์ •๋ฆฌ https://www.datamanim.com/dataset/ADPpb/prepare.html ์ค€๋น„ ๋ฐฉ๋ฒ• — DataManim ์บ๊ธ€์— ์ฝ”๋“œ ๊ณต์œ  ๋ฐ ๋‹ค๋ฅธ๋ถ„๋“ค ์ฝ”๋“œ ํ™•์ธ www.datamanim.com https://cafe.naver.com/sqlpd/30789 ADP ์‹ค๊ธฐ ๊ณต๋ถ€ ์ •๋ฆฌ ์‚ฌ์ดํŠธ (ํŒŒ์ด์ฌ) ๋Œ€ํ•œ๋ฏผ๊ตญ ๋ชจ์ž„์˜ ์‹œ์ž‘, ๋„ค์ด๋ฒ„ ์นดํŽ˜ cafe.naver.com 2. ๊ธฐ์ถœ ์ •๋ฆฌ https://lovelydiary.tistory.com/381 ADP) ADP ์‹ค๊ธฐ ๊ธฐ์ถœ๋ฌธ์ œ ๋ชจ์Œ (17, 18, 19, 20, 21, 22, 23, 24, 25ํšŒ) ADP ์‹ค๊ธฐ ๋ฌธ์ œ์ง‘์„ ์‚ฌ๊ธฐ๋ณด๋‹ค, ํ•„๊ธฐ ๋ฌธ์ œ์ง‘์— ์žˆ๋Š” ๊ฐ์ข… ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ์˜ˆ์ œ๋“ค์„ ์ง์ ‘ ์ฝ”๋“œ๋กœ ์งœ๋ณด๋Š” ๊ฒƒ์ด ์ข‹๋‹ค๋Š” ํ›„๊ธฐ๋“ค์„ ์ฝ๊ณ , ์ฝ”๋“œ ์˜ˆ์ œ๋ฅผ ์ž‘..
ADP ์‹ค๊ธฐ์—์„œ๋Š” ๊ฐ’์„ ์ฃผ์ง€์•Š๊ณ  ๋‹จ์ˆœํ•œ ํ†ต๊ณ„๋ถ„์„์„ ์š”๊ตฌํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋งŽ์ด ์ถœ์ œ๋˜๋Š” ๊ฒƒ ๊ฐ™์•„์„œ ํ†ต๊ณ„๋Ÿ‰์„ ๊ตฌํ•ด์„œ ์‹ ๋ขฐ๊ตฌ๊ฐ„ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ •๋ฆฌํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค! ์˜ˆ์ œ1 ๋ฌธ์ œ ์–ด๋А ์ œ์•ฝํšŒ์‚ฌ์—์„œ ์ƒˆ๋กญ๊ฒŒ ์ถœ์‹œํ•˜๋ ค๋Š” ์•Œ์•ฝ์˜ ํšจ๋Šฅ์„ ํ…Œ์ŠคํŠธํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ž„์ƒ์‹คํ—˜์„ ํ†ตํ•ด ํ‘œ๋ณธ 13๊ฐœ๋ฅผ ๋ฝ‘์•˜๋”๋‹ˆ ํ‘œ์ค€ํŽธ์ฐจ๋Š” 3.2๊ฐ€ ๋‚˜์™”๋‹ค๊ณ  ํ•œ๋‹ค. ์ด๋•Œ ์•Œ์•ฝ์˜ ๋ชจ๋ถ„์‚ฐ์— ๋Œ€ํ•œ 95% ์‹ ๋ขฐ๊ตฌ๊ฐ„์„ ๊ตฌํ•˜์‹œ์˜ค. ์˜ˆ์ œ1 ํ’€์ด from scipy.stats import chi2 import numpy as np import pandas as pd #์ž์œ ๋„ df=13-1 #ํ‘œ์ค€ํŽธ์ฐจ std=3.2 #chi(์ž์œ ๋„) chi_=chi2(df) #t๊ฐ’ t_025=chi_.ppf(0.025) t_975=chi_.ppf(0.975) - ์‹ ๋ขฐ๊ตฌ๊ฐ„ L_= round..
์ด๋ฒˆ์— ์ฒซ ADP ์‹ค๊ธฐ์‹œํ—˜์„ ์‘์‹œํ•˜๊ณ  ์™”๋Š”๋ฐ์š”! 4์‹œ๊ฐ„์ด..๊ธธ์ค„ ์•Œ์•˜๋Š”๋ฐ ์ •๋ง ์ˆœ์‹๊ฐ„์— ์ง€๋‚˜๊ฐ€๋”๋ผ๊ตฌ์š”.. ์‹ค๊ธฐ์‹œํ—˜ ํ›„๊ธฐ๊ฐ€ ์—†๊ธฐ๋„ ํ•˜๊ณ  ์žˆ๋”๋ผ๋„ ์˜ค๋ž˜๋œ ํ›„๊ธฐ๋ผ์„œ ์ตœ๊ทผ ์‹œํ—˜ ๋ฐฉ์‹์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ ์–ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์‹œํ—˜๋ฐฉ์‹ 1. ์•ˆ๋‚ด๋œ ํŽ˜์ด์ง€ ์ ‘์† ๋ฐ ๋กœ๊ทธ์ธ(์•„์ด๋””, ๋น„๋ฒˆ์€ ์ปดํ“จํ„ฐ์— ๋ถ€์ฐฉ๋˜์–ด์žˆ์Œ) 2. ์‹œํ—˜์‹œ์ž‘ ์ „ ์—ฐ์Šตํ™˜๊ฒฝ์— ์ ‘์†ํ•ด๋ณผ ์ˆ˜ ์žˆ์Œ 3. ์—ฐ์Šตํ™˜๊ฒฝ Python์œผ๋กœ ๋“ค์–ด๊ฐ€์‹œ๋ฉด ์ฃผํ”ผํ„ฐํ™˜๊ฒฝ์ด ๋ฐ”๋กœ ๋ณด์ด๊ณ  ์—ฐ์Šต์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 4. ์‹œํ—˜์‹œ๊ฐ„์ด ์‹œ์ž‘๋˜๋ฉด ํ•œ ํŒŒ์ผ์— ๋ชจ๋“  ๋ฌธ์ œ ๋‹ต์•ˆ์„ ์ž‘์„ฑํ•ด์ฃผ์‹œ๋ฉด๋˜๊ณ  ์ฝ”๋“œ์ž‘์„ฑ ์™ธ์—๋Š” Markdown ํ˜•์‹์œผ๋กœ ๋‹ต์•ˆ์„ ์ž‘์„ฑํ•ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. *์„ธ๋ถ€๋ฌธ์ œ๊ฐ€ ์ƒ์ƒ์ด์ƒ์œผ๋กœ ๊ต‰์žฅํžˆ ๋งŽ์Šต๋‹ˆ๋‹ค. ์‹œ๊ฐ„์ด ์˜ค๋ž˜๊ฑธ๋ฆฌ๋Š”๋ฐ ๋ฐฐ์ ์ด ๋‚ฎ์€ ๋ฌธ์ œ๋„ ์žˆ์œผ๋ฏ€๋กœ ์‹œ๊ฐ„๋ถ„๋ฐฐ ์ž˜ ํ•˜์…”์„œ ํ‘ธ์…”์•ผ ๋ฉ๋‹ˆ๋‹ค! 5. ์ œ..
- loc : ์ธ๋ฑ์Šค ์ด๋ฆ„์„ ๊ธฐ์ค€์œผ๋กœ ์ถ”์ถœ(์‚ฌ๋žŒ์ด ์ฝ์„ ์ˆ˜ ์žˆ๋Š” label๊ฐ’์œผ๋กœ ํŠน์ • ๊ฐ’๋“ค์„ ๊ณจ๋ผ์˜ค๋Š” ๋ฐฉ๋ฒ•) - iloc : ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ํ–‰์ด๋‚˜ ์ปฌ๋Ÿผ์˜ ์ธ๋ฑ์Šค ๊ฐ’์œผ๋กœ ์ ‘๊ทผํ•˜๋Š” ๋ฐฉ๋ฒ• loc ์˜ˆ์‹œ df.loc[ํ–‰ ์ธ๋ฑ์‹ฑ ๊ฐ’, ์—ด ์ธ๋ฑ์‹ฑ ๊ฐ’] 1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ import pandas as pd customer_m=pd.read_csv("customer_master.csv") customer_m 2. ๋ ˆ์ด๋ธ” ์ด๋ฆ„์ด "0"์ธ ํ–‰ ์ถ”์ถœ customer_m.loc[0] ๋ ˆ์ด๋ธ” ์ด๋ฆ„์ด "0"์ธ ํ–‰์„ ์ถ”์ถœํ•œ ๊ฒƒ์„ ํ™•์ธ 3. ๋ ˆ์ด๋ธ” ์ด๋ฆ„์ด "1"์ธ ํ–‰ ์ถ”์ถœ customer_m.loc[1] ๋ ˆ์ด๋ธ” ์ด๋ฆ„์ด "1"์ธ ํ–‰์„ ์ถ”์ถœํ•œ ๊ฒƒ์„ ํ™•์ธ 4. ๋ ˆ์ด๋ธ” ์ด๋ฆ„์ด "customer_id"์ธ ์—ด ์ถ”์ถœ customer_m.lo..
14๋ฒˆ ๋‹ค์Œ์€ ๊ธฐ์—…์—์„œ ์ƒ์„ฑ๋œ ์ฃผ๋ฌธ ๋ฐ์ดํ„ฐ์ด๋‹ค. 80,009๊ฑด์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•˜์—ฌ ์ •์‹œ ๋„์ฐฉ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ , ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•˜์—ฌ ์ •์‹œ๋„์ฐฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ์˜ˆ์ธก ํ™•๋ฅ ์„ ๊ธฐ๋กํ•œ csv๋ฅผ ์ƒ์„ฑํ•˜์‹œ์˜ค. ํ’€์ด 1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ import pandas as pd data=pd.read_csv("Train.csv") 2. ๋ฐ์ดํ„ฐ ํƒ€์ž… ํ™•์ธํ•˜๊ธฐ print(data.info()) 3. x,y ์ปฌ๋Ÿผ ๋‚˜๋ˆ ์„œ ์ €์žฅ X=data.drop('Reached.on.Time_Y.N', axis=1) y=data[['Reached.on.Time_Y.N']] 4. ๋”๋ฏธ ๋ณ€ํ™˜ X=pd.get_dummies(X) 5. train/test ๋ถ„๋ฆฌ from sklearn.model_selection import train_test_..
13๋ฒˆ ๋ฌธ์ œ ๋‹ค์Œ์€ Insurance epdlxj tpxmdlek. Charges ํ•ญ๋ชฉ์—์„œ ์ด์ƒ๊ฐ’์˜ ํ•ฉ์„ ๊ตฌํ•˜์‹œ์˜ค. (์ด์ƒ๊ฐ’์€ ํ‰๊ท ์—์„œ 1.5 ํ‘œ์ค€ํŽธ์ฐจ ์ด์ƒ์ธ ๊ฐ’) ํ’€์ด 1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ import pandas as pd data=pd.read_csv("insurance.csv") 2. ๋ฐ์ดํ„ฐ ํƒ€์ž… ํ™•์ธ print(data.info()) 3. ํ‰๊ท , ํ‘œ์ค€ํŽธ์ฐจ ์ €์žฅ mean=data['charges'].mean() std=data['charges'].std() 4. ์ด์ƒ๊ฐ’์ธ ํ–‰๋งŒ ์ €์žฅ result=data[data['charges']>=mean+1.5*std] 5. ํ•ฉ result=result['charges'].sum() 6. ๊ฒฐ๊ณผ๊ฐ’ ์ œ์ถœ print(result)
xod22
'๐Ÿ“š Study' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก (2 Page)