[์ „์ฒ˜๋ฆฌ] Data Reduction(๋ฐ์ดํ„ฐ ์ถ•์†Œ) - ์ฐจ์›์ถ•์†Œ

2022. 3. 7. 22:10ยท ๐Ÿ” ๋ฐ์ดํ„ฐ ๋ถ„์„/02. Data Processing
728x90

2022.03.06 - [๋ฐ์ดํ„ฐ ๋ถ„์„/02. Data Processing] - [์ „์ฒ˜๋ฆฌ] Data Transformation(๋ฐ์ดํ„ฐ ๋ณ€ํ™˜)

 

[์ „์ฒ˜๋ฆฌ] Data Transformation(๋ฐ์ดํ„ฐ ๋ณ€ํ™˜) - ์ •๊ทœํ™”

2022.03.04 - [๋ฐ์ดํ„ฐ ๋ถ„์„/02. Data Processing] - [์ „์ฒ˜๋ฆฌ] Types of data processing ๋ฐ ๋ฐ์ดํ„ฐ ํด๋ฆฌ๋‹ [์ „์ฒ˜๋ฆฌ] Types of data processing ๋ฐ ๋ฐ์ดํ„ฐ ํด๋ฆฌ๋‹ Data Processing : Data Processing(๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ..

xod22.tistory.com

์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜์— ๋Œ€ํ•ด ๊ณต๋ถ€๋ฅผ ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ „์ฒ˜๋ฆฌ์˜ ์„ธ๋ฒˆ์งธ ์ข…๋ฅ˜์ธ Data Reduction(๋ฐ์ดํ„ฐ ์ถ•์†Œ)์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค!

 

1. Data Cleaning(๋ฐ์ดํ„ฐ ํด๋ฆฌ๋‹) - ๊ฒฐ์ธก์น˜์ฒ˜๋ฆฌ,,

2. Data Transformation(๋ฐ์ดํ„ฐ ๋ณ€ํ™˜) - ์ •๊ทœํ™”,,,๋“ฑ๋“ฑ

3. Data Reduction(๋ฐ์ดํ„ฐ ์ถ•์†Œ) - ์ฐจ์›์ถ•์†Œ,,,


Data Reduction(๋ฐ์ดํ„ฐ ์ถ•์†Œ)

 

๋ฐ์ดํ„ฐ๋ฅผ ์ถ•์†Œํ•˜๋Š” ๋ฐฉ๋ฒ•์—๋Š” ๋‘๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค.

 

1. Dimensionality reduction - ์ฐจ์›์ถ•์†Œ

2. Numerosity reduction - ๋ฐ์ดํ„ฐ ํฌ๊ธฐ ์ค„์ž„

 


*๋จผ์ € ์ฒซ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์ธ Dimensionality reduction์— ๋Œ€ํ•ด ์ ์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค..!

Dimensionality reduction(์ฐจ์›์ถ•์†Œ)

 

1. SVD

 

: ํ–‰๋ ฌ A๋ฅผ m x m ํฌ๊ธฐ์ธ U, m x n ํฌ๊ธฐ์ธ โˆ‘, n x n ํฌ๊ธฐ์ธ Vt ๋กœ ํŠน์ด๊ฐ’ ๋ถ„ํ•ด(SVD)ํ•˜๋Š” ๊ฒƒ์„ Full SVD๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ Full SVD๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ๋“œ๋ญ…๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ๋Š” โˆ‘์˜ ๋น„๋Œ€๊ฐ ๋ถ€๋ถ„๊ณผ ๋Œ€๊ฐ ์›์†Œ ์ค‘ ํŠน์ด๊ฐ’์ด 0์ธ ๋ถ€๋ถ„์„ ๋ชจ๋‘ ์ œ๊ฑฐํ•˜๊ณ , ์ œ๊ฑฐ๋œ โˆ‘์— ๋Œ€์‘๋˜๋Š” U์™€ V ์›์†Œ๋„ ํ•จ๊ป˜ ์ œ๊ฑฐํ•ด ์ฐจ์›์„ ์ค„์ธ ํ˜•ํƒœ๋กœ SVD๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฐ ๋ฐฉ์‹์„ Truncated SVD๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. 

ํŠน์ด๊ฐ’์„ ๋ช‡๊ฐœ๋กœ ์ง€์ •ํ•˜๋А๋ƒ๋ฅผ ๋‹ค๋ฅด๊ฒŒ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

~์‹ค์Šต~

 

1. ํŒจํ‚ค์ง€ ์ž„ํฌํŠธ

from numpy import array
from numpy import diag
from numpy import zeros
from scipy.linalg import svd

 

2.  5 by 10 matrix ์ƒ์„ฑ

A=array([[1,2,3,4,5,6,7,8,9,10],[11,12,13,14,15,16,17,18,19,20],[21,22,23,24,25,26,27,28,29,30],[31,32,33,34,35,36,37,38,39,40],[51,52,53,54,55,56,57,58,59,60]])
print(A)

 

3. SVD ์ˆ˜ํ–‰

=> SVDํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด 3๊ฐœ์˜ ๋ฐ˜ํ™˜๊ฐ’์„ ์ €์žฅํ•จ

U, s, VT = svd(A)

 

4. ๋‚ด์ ์„ ํ†ตํ•ด ์›๋ณธํ–‰๋ ฌ๋กœ ์›๋ณตํ•˜๊ธฐ!

 

~s๋Š” 1์ฐจ์› ๋ฒกํ„ฐ๋กœ ์ €์žฅ๋˜๋ฏ€๋กœ ๋‹ค์‹œ ๋Œ€๊ฐํ–‰๋ ฌ๋กœ ๋ณต์›~

# sํ–‰๋ ฌ->1์ฐจ์›์ด๋ฏ€๋กœ 0์„ ํฌํ•จํ•œ ๋Œ€๊ฐํ–‰๋ ฌ๋กœ ๋ณต์›
#์˜ํ–‰๋ ฌ
smatrix = zeros((len(s), len(s)))
#๋Œ€๊ฐํ–‰๋ ฌ
smatrix[:len(s), :len(s)] = diag(s)

 

~k=4(ํŠน์ด๊ฐ’ 4๊ฐœ๋กœ ์ง€์ •)~

k=4
#k=4,3,2๋กœ ๋ณ€๊ฒฝํ–ˆ์„๋•Œ ๊ฐ™์€ ๊ฒฐ๊ณผ!

smatrix=smatrix[:k, :k]
VT=VT[:k, :]
U=U[:,:k]

#๋‚ด์ 
B=U.dot(smatrix.dot(VT))
print(B)
#์›๋ณธ ํ–‰๋ ฌ๊ณผ ์ •ํ™•ํ•˜๊ฒŒ ์ผ์น˜

 

~k=1์ผ๋•Œ~

k=1
smatrix=smatrix[:k, :k]
VT=VT[:k, :]
U=U[:,:k]

#๋‚ด์ 
B=U.dot(smatrix.dot(VT))
print(B)
#์›๋ณธํ–‰๋ ฌ๊ณผ ์ผ์น˜ํ•˜์ง€ ์•Š์Œ!

-> k=1์ผ๋•Œ๋Š” ์›๋ณธํ–‰๋ ฌ๊ณผ ์ผ์น˜ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค..!


2. PCA

 

: ์ฃผ์„ฑ๋ถ„์„ ๊ณ„์‚ฐํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์ฒ˜์Œ ๋ช‡๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„๋งŒ์„ ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœ์„ธ์Šค.

 

~์‹ค์Šต~

iris๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์„œ PCA ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

1. ํŒจํ‚ค์ง€ ์ž„ํฌํŠธ ๋ฐ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

from sklearn import datasets
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import pandas as pd
iris=datasets.load_iris()

 

 

2. x,y ์ปฌ๋Ÿผ ๋”ฐ๋กœ ์ €์žฅ

#y๋ฐ์ดํ„ฐ๋Š” labels์— ์ €์žฅ
labels=pd.DataFrame(iris.target)
labels.columns=['labels']

#x๋ฐ์ดํ„ฐ๋Š” data์— ์ €์žฅ
data=pd.DataFrame(iris.data, columns=['Sepal length', 'Sepal width', 'Petal length', 'Petal width'])

 

 

3. ์›๋ฐ์ดํ„ฐ plot

fig = plt.figure( figsize=(6,6))
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134) 
ax.scatter(data['Sepal length'],data['Sepal width'],data['Petal length'],c=labels,alpha=0.5)
ax.set_xlabel('Sepal lenth')
ax.set_ylabel('Sepal width')
ax.set_zlabel('Petal length')
plt.show()

-> ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ 3Dplot์„ ํ™œ์šฉํ•ด์•ผํ•จ...

 

 

4. pca ๋ถ„์„

 

~ํŒจํ‚ค์ง€ ์ž„ํฌํŠธ~

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler 
from sklearn.pipeline import make_pipeline 
import matplotlib.pyplot as plt
#์ •๊ทœํ™”
scaler=StandardScaler()
#PCA
pca=PCA()

#pipeline๋งŒ๋“ฌ
pipeline=make_pipeline(scaler,pca)
pipeline.fit(data)
#๋ฐ์ดํ„ฐ ๋ถ„์‚ฐ์— ๊ฐ€์žฅ "์œ ์˜ํ•œ"์„ฑ๋ถ„์ด ๋ฌด์—‡์ธ์ง€ ํ™•์ธ
from sklearn.decomposition import PCA
import pandas as pd
print(pca.explained_variance_ratio_) #๋ถ„์‚ฐ
print(pd.DataFrame(pca.components_, columns=iris.feature_names))

์ฒซ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„, ๋‘๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„์œผ๋กœ 96%์˜ ๋ถ„์‚ฐ์„ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ์Œ!

๊ฐ ์ปฌ๋Ÿผ๋ณ„ ๊ธฐ์—ฌ๋„(?)๋ฅผ ํ™•์ธํ•˜๋ฉด ์œ„์˜ ํ‘œ์™€ ๊ฐ™๋‹ค..!

 

 

5. ๋ถ„์‚ฐ plotting ํ•ด๋ณด๊ธฐ

features=range(pca.n_components_)
plt.bar(features, pca.explained_variance_)
plt.xlabel('PCA feature')
plt.ylabel('variance')
plt.xticks(features)
plt.show()
# ๋‘๊ฐœ์˜ ์ปดํฌ๋„ŒํŠธ๋กœ ์ด 96%๋ฅผ ์„ค๋ช…

= >  ๋‘๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„์œผ๋กœ 96%๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธ

 

 

6. PCA๋กœ ๋‘๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„์œผ๋กœ ์ถ•์†Œ๋œ ๋ฐ์ดํ„ฐ plot

# ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ 4๊ฐœ๋ฅผ ์ฃผ์„ฑ๋ถ„ 2๊ฐœ๋กœ ์ถ•์†Œํ•ด์„œ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Œ
model=PCA(n_components=2)
pca_features=model.fit_transform(data)

xf=pca_features[:,0]
yf=pca_features[:,1]
plt.scatter(xf, yf, c=iris.target);
plt.show();

์› ๋ฐ์ดํ„ฐ๋Š” 3D plot์„ ์ด์šฉํ•˜์—ฌ plotting๋˜์—ˆ์ง€๋งŒ pca๋กœ ์ฐจ์›์„ ์ถ•์†Œํ•œ ํ›„์— ๊ฐ„๋‹จํ•˜๊ฒŒ plotํ•  ์ˆ˜ ์žˆ๋‹ค!

728x90

'๐Ÿ” ๋ฐ์ดํ„ฐ ๋ถ„์„ > 02. Data Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[์ „์ฒ˜๋ฆฌ] ๋„ค์ด๋ฒ„ ์˜ํ™” ํ‰์  ํฌ๋กค๋ง ๋ฐ์ดํ„ฐ - Preprocessing  (0) 2022.03.24
[์ „์ฒ˜๋ฆฌ] Data Reduction(๋ฐ์ดํ„ฐ ์ถ•์†Œ) - ์ˆ˜์น˜์  ์ถ•์†Œ  (0) 2022.03.08
[์ „์ฒ˜๋ฆฌ] Data Transformation(๋ฐ์ดํ„ฐ ๋ณ€ํ™˜) - ์ƒˆ๋กœ์šด ์†์„ฑ ๋งŒ๋“ค๊ธฐ  (0) 2022.03.06
[์ „์ฒ˜๋ฆฌ] Data Transformation(๋ฐ์ดํ„ฐ ๋ณ€ํ™˜) - ์ •๊ทœํ™”  (0) 2022.03.06
[์ „์ฒ˜๋ฆฌ] Types of data processing ๋ฐ ๋ฐ์ดํ„ฐ ํด๋ฆฌ๋‹  (0) 2022.03.04
'๐Ÿ” ๋ฐ์ดํ„ฐ ๋ถ„์„/02. Data Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [์ „์ฒ˜๋ฆฌ] ๋„ค์ด๋ฒ„ ์˜ํ™” ํ‰์  ํฌ๋กค๋ง ๋ฐ์ดํ„ฐ - Preprocessing
  • [์ „์ฒ˜๋ฆฌ] Data Reduction(๋ฐ์ดํ„ฐ ์ถ•์†Œ) - ์ˆ˜์น˜์  ์ถ•์†Œ
  • [์ „์ฒ˜๋ฆฌ] Data Transformation(๋ฐ์ดํ„ฐ ๋ณ€ํ™˜) - ์ƒˆ๋กœ์šด ์†์„ฑ ๋งŒ๋“ค๊ธฐ
  • [์ „์ฒ˜๋ฆฌ] Data Transformation(๋ฐ์ดํ„ฐ ๋ณ€ํ™˜) - ์ •๊ทœํ™”
xod22
xod22
Data Analyst Storyxod22 ๋‹˜์˜ ๋ธ”๋กœ๊ทธ์ž…๋‹ˆ๋‹ค.
xod22
Data Analyst Story
xod22
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • ๐ŸŒณ Home ๐ŸŒณ (178)
    • ๐Ÿฌ MySQL (46)
      • ๋ฌธ์ œํ’€์ด (29)
      • SQL ๋ฐ์ดํ„ฐ๋ถ„์„ ์บ ํ”„ (9)
    • ๐Ÿ” ๋ฐ์ดํ„ฐ ๋ถ„์„ (53)
      • Product (5)
      • 01. Data Collection (7)
      • 02. Data Processing (7)
      • 03. Data Visualizaton (15)
      • 04. Data Analysis (19)
    • ๐Ÿ“š Study (20)
      • ๋น…๋ฐ์ดํ„ฐ ๋ถ„์„๊ธฐ์‚ฌ ์‹ค๊ธฐ (8)
      • ADP ์‹ค๊ธฐ (7)
      • ๊ตฌ๊ธ€ ์• ๋„๋ฆฌํ‹ฑ์Šค (5)
      • ํ”„๋กœ์ ํŠธ (0)
    • โœ๏ธ ์ƒ๊ฐ ๊ธฐ๋ก (10)
      • ๋…์„œ (5)
      • ์ž๋ฃŒ ์Šคํฌ๋žฉ (2)
      • ์ทจ์—… ์ค€๋น„ (2)
    • ๐Ÿ’ป GitHub (6)
      • ์ˆ˜์ • ๋ฐ ๋ณ€๊ฒฝ (5)
    • ๐Ÿ ๋จธ์‹ ๋Ÿฌ๋‹ | ๋”ฅ๋Ÿฌ๋‹ (35)
      • ์ถ”์ฒœ์‹œ์Šคํ…œ (19)
      • ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ (1)
      • ํ…์ŠคํŠธ ๋ถ„์„ (10)

๊ณต์ง€์‚ฌํ•ญ

  • Github
  • How to ๊ตฌ๋…, ์ข‹์•„์š”

์ธ๊ธฐ ๊ธ€

์ตœ๊ทผ ๋Œ“๊ธ€

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ํ™ˆ
  • ํƒœ๊ทธ
  • ๋ฐฉ๋ช…๋ก

ํƒœ๊ทธ

  • ๋ฐ์ดํ„ฐ์‹œ๊ฐํ™”
  • ์ฝ”๋”ฉํ…Œ์ŠคํŠธ
  • ADP์‹ค๊ธฐ
  • ์ถ”์ฒœ์‹œ์Šคํ…œ
  • ์‹œ๊ฐํ™”
  • ์ž‘์—…ํ˜•์ œ1์œ ํ˜•
  • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค
  • pandas
  • MySQL
  • ์„ธ๋ฏธ๋‚˜
  • ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ
  • SQL
  • ํฌ๋กค๋ง
  • github
  • ๋ฐ์ดํ„ฐ๋ฆฌ์•ˆ
  • ๊ตฌ๊ธ€์• ๋„๋ฆฌํ‹ฑ์Šค
  • ๊นƒํ—ˆ๋ธŒ
  • ๊ธฐ์ถœํ’€์ด
  • ํ•ด์ปค๋žญํฌ
  • tableau
  • ๋น…๋ฐ์ดํ„ฐ๋ถ„์„๊ธฐ์‚ฌ
  • Python
  • ๋ฐ์ดํ„ฐ๋ถ„์„
  • ํŒŒ์ด์ฌ
  • ์ „์ฒ˜๋ฆฌ
  • Plot
  • ๋น…๋ถ„๊ธฐ
  • ํƒœ๋ธ”๋กœ
  • ํ†ต๊ณ„์ ๋ชจ๋ธ๋ง
  • ํ…์ŠคํŠธ๋ถ„์„

์ตœ๊ทผ ๊ธ€

hELLO ยท Designed By ์ •์ƒ์šฐ.v4.2.0
xod22
[์ „์ฒ˜๋ฆฌ] Data Reduction(๋ฐ์ดํ„ฐ ์ถ•์†Œ) - ์ฐจ์›์ถ•์†Œ
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๊ฐœ์ธ์ •๋ณด

  • ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ
  • ํฌ๋Ÿผ
  • ๋กœ๊ทธ์ธ

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.