๐Ÿ ๋จธ์‹ ๋Ÿฌ๋‹ | ๋”ฅ๋Ÿฌ๋‹/์ถ”์ฒœ์‹œ์Šคํ…œ

[K-Data x ๋Ÿฌ๋‹์Šคํ‘ผ์ฆˆ] 2-2. ํ˜‘์—… ํ•„ํ„ฐ๋ง(CF)์˜ ์›๋ฆฌ

xod22 2022. 1. 15. 12:40
728x90

์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ปจํ…์ธ  ๊ธฐ๋ฐ˜ ์ถ”์ฒœ์ธ CB(Content-based Recommendation)์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ•ด๋ดค๋Š”๋ฐ์š”!

์ด๋ฒˆ์—๋Š” ๋งŽ์ด ์“ฐ์ด๋Š” ํ˜‘์—…ํ•„ํ„ฐ๋ง(CF)์— ๋Œ€ํ•ด ์ ์–ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค!!

# ํ˜‘์—… ํ•„ํ„ฐ๋ง?

: CF(Collaborative Filtering)

 

CF์˜ ์˜ˆ์‹œ

 

=> ์œ ์ € A์™€ ๋น„์Šทํ•œ ์„ฑํ–ฅ์„ ๊ฐ–๋Š” ์œ ์ €๋“ค์ด ์„ ํ˜ธํ•˜๋Š” ์•„์ดํ…œ์„ ์ถ”์ฒœํ•œ๋‹ค.

=> ์•„์ดํ…œ์ด ๊ฐ€์ง„ ์†์„ฑ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฉด์„œ๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„!

# 1) User-based Collaborative Filtering

: ๋‘ ์œ ์ €๊ฐ€ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•œ ์•„์ดํ…œ์„ ์„ ํ˜ธํ•˜๋Š”๊ฐ€?

์œ ์ €๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•œ๋’ค, ๋‚˜์™€ ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ ์œ ์ €๋“ค์ด ์„ ํ˜ธํ•˜๋Š” ์•„์ดํ…œ์„ ์ถ”์ฒœํ•จ!

 

์˜ˆ์‹œ ) 

User B๊ฐ€ ์Šคํƒ€์›Œ์ฆˆ์— ๋งค๊ธด ํ‰์ ์„ ์˜ˆ์ธกํ•˜๊ณ  ์‹ถ์€ ์ƒํ™ฉ์—์„œ

User A,B๊ฐ€ ๊ฐ ์˜ํ™”์— ๋งค๊ธด ํ‰์ ์„ ๋น„๊ตํ•ด๋ณด๋ฉด ๋‘ ์œ ์ €๊ฐ€ ๋น„์Šทํ•œ ์ทจํ–ฅ์„ ๊ฐ€์กŒ์Œ์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค.

๋”ฐ๋ผ์„œ User B์˜ ์Šคํƒ€์›Œ์ฆˆ์— ๋Œ€ํ•œ ์„ ํ˜ธ๋„๋Š” ๋†’์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ธก๋œ๋‹ค!

 

์–ด๋–ป๊ฒŒ ํ‰์ ์„ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์„๊นŒ?

 

โ‘  Average

: ๋‹ค๋ฅธ ์œ ์ €๋“ค์˜ ์Šคํƒ€์›Œ์ฆˆ์— ๋Œ€ํ•œ rating์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜์—ฌ ํ‰๊ท ์„ ๋ƒ„.

User B์ž…์žฅ์—์„œ ๋ชจ๋“  ํ‰์ ์„ ๋™์ผํ•˜๊ฒŒ ๋ฐ˜์˜ํ•จ!

โ‘ก Weighted Average

- ์œ ์ €๊ฐ„์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๊ฐ’์„ weight(๊ฐ€์ค‘์น˜)๋กœ ์‚ฌ์šฉํ•˜์—ฌ ํ‰์ ์˜ ํ‰๊ท ์„ ๋ƒ„!

- User B ์ž…์žฅ์—์„œ ๋ณผ ๋•Œ, ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ User A์˜ ํ‰์ ์€ ๋งŽ์ด ๋ฐ˜์˜๋˜๊ณ  User C์˜ ํ‰์ ์€ ์ ๊ฒŒ ๋ฐ˜์˜๋˜์–ด ์˜ˆ์ธก ํ‰์ ์ด ๊ตฌํ•ด์ง„๋‹ค!


ํ•˜์ง€๋งŒ, ๋‚ด๊ฐ€ ํ‰์ ์„ ๋‚ด๋ฆฌ๋Š” ๊ธฐ์ค€์€ ๋‹ค๋ฅธ ์œ ์ €์™€ ๋‹ค๋ฅด๋‹ค!
์œ ์‚ฌ๋„๊ฐ€ ๋น„์Šทํ•˜๋”๋ผ๋„?
์–ด๋–ค ์œ ์ €๋Š” ์ „์ฒด์ ์œผ๋กœ ๋†’์€ ํ‰์ ์„ ์ค€ ๊ฒƒ์ผ์ˆ˜๋„ ์žˆ๊ณ ,
๋ฐ˜๋Œ€๋กœ ์ „์ฒด์ ์œผ๋กœ ๋‚ฎ๊ฒŒ์ค€ ๊ฒƒ์ผ ์ˆ˜๋„ ์žˆ๋‹ค.

 

โ‘ข Weighted Average with deviation

: Deviation์„ ์‚ฌ์šฉํ•˜์ž!

์ ˆ๋Œ€ ํ‰์ ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ

๊ฐ๊ฐ ์œ ์ €์˜ ํ‰๊ท  ํ‰์ ์—์„œ ์–ผ๋งˆ๋‚˜ ๋†’์€์ง€ ํ˜น์€ ๋‚ฎ์€์ง€, ๊ทธ ํŽธ์ฐจ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

์˜ˆ์‹œ )

์–ด๋–ค ์œ ์ €์˜ ํ‰๊ท ์ด 2.5์ ์ธ๋ฐ, 5์ ์„ ์คฌ๋‹ค๋ฉด ํŽธ์ฐจ๊ฐ€ ํฌ๊ธฐ ๋•Œ๋ฌธ์— ์•„์ฃผ ๋†’๊ฒŒ ํ‰๊ฐ€ํ•œ ๊ฒƒ

๋ฐ˜๋ฉด ๋ชจ๋“  ์•„์ดํ…œ์˜ ํ‰์ ์„ 5์ ์œผ๋กœ ์ค€ ์œ ์ €๋Š” ํŽธ์ฐจ๊ฐ€ ์ž‘๊ธฐ๋•Œ๋ฌธ์— ์•„์ดํ…œ๋ผ๋ฆฌ์˜ ๋น„๊ต๊ฐ€ ์–ด๋ ต๊ฒŒ๋œ๋‹ค.

 

๋จผ์ € ๋ชจ๋“  ํ‰์  ๋ฐ์ดํ„ฐ๋ฅผ deviation ๋ฐ์ดํ„ฐ๋กœ ๋ฐ”๊ฟˆ
๊ตฌํ•ด์ง„ deviation์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ predicted deviation ๊ตฌํ•จ

* ์ตœ์ข… predict rating=๊ฐ๊ฐ์˜ ์œ ์ €ํ‰๊ท  rating + predicted deviation

=> ์ ˆ๋Œ€ํ‰์ ์„ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ ๋ณด๋‹ค ๋” ์„ฑ๋Šฅ์ด ์ข‹๋‹ค

 

 

* ์ถ”๊ฐ€์ ์œผ๋กœ ์•Œ์•„ ๋†“์„๊ฒƒ..!

โ‘ฃ K Nearest Neighbors Collaborative Filtering

๋ชจ๋“  ์‚ฌ์šฉ์ž๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์œ ์ €(๋‚˜)์™€ ์œ ์‚ฌํ•œ K=25~50๋ช…์˜ ์œ ์ €๋ฅผ ์ด์šฉํ•˜์—ฌ ์œ ์ €(๋‚˜)์˜ ์Šคํƒ€์›Œ์ฆˆ ํ‰์  ์˜ˆ์ธก!

: ์‚ฌ์šฉ์ž๊ฐ€ ๋งŽ๋‹ค๋ฉด? ๊ณ„์‚ฐํ•ด์•ผ๋  ์œ ์‚ฌ๋„๋Š” ๋งŽ์•„์ง€๊ณ  ์„ฑ๋Šฅ์€ ๋–จ์–ด์ง€๊ฒŒ๋จ.. ใ… ใ… 

์ด๋•Œ, ์œ ์ €u์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ K๋ช…์˜ ์œ ์ €๋ฅผ ์ด์šฉํ•ด ํ‰์ ์„ ์˜ˆ์ธกํ•œ๋‹ค

๋‹จ K=๊ฐ’์€ ์ง์ ‘ ํŠœ๋‹ํ•ด์•ผํ•˜๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ’

 

# 2) Item-based Collaborative Filtering

: ์•„์ดํ…œ ์ธก๋ฉด์—์„œ ์œ ์ €๋ฅผ ์˜ˆ์ธก(?) / User-based์˜ ๋ฐ˜๋Œ€๋ผ๊ณ  ์ƒ๊ฐํ•˜์ž.

- ๋‘ ์•„์ดํ…œ์ด ์œ ์ €๋กœ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•œ ํ‰๊ฐ€๋ฅผ ๋ฐ›์•˜๋Š”๊ฐ€?

- ์•„์ดํ…œ ์„ ํ˜ธ๋„๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์—ฐ๊ด€์„ฑ์ด ๋†’์€ ๋‹ค๋ฅธ ์•„์ดํ…œ์„ ์ถ”์ฒœ, ์•„์ดํ…œ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•œ๋‹ค.

User B์˜ ์Šคํƒ€์›Œ์ฆˆ์— ๋Œ€ํ•œ ํ‰์ ์ด ์•Œ๊ณ ์‹ถ์„ ๋•Œ,

์Šคํƒ€์›Œ์ฆˆ๋Š” ์•„์ด์–ธ๋งจ, ํ—ํฌ์™€ ์œ ์‚ฌ๋„๊ฐ€ ๋†’๋‹ค. ๋ฐ˜๋ฉด ๋น„ํฌ์„ ๋ผ์ด์ฆˆ์™€ ๋…ธํŒ…ํž์€ ์œ ์‚ฌ๋„๊ฐ€ ๋‚ฎ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

=> ๋”ฐ๋ผ์„œ User B์˜ ์Šคํƒ€์›Œ์ฆˆ์— ๋Œ€ํ•œ ํ‰์ ์€ ์•„์ด์–ธ๋งจ, ํ—ํฌ์™€ ๋น„์Šทํ•˜๊ฒŒ ๋†’์„ ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ธกํ•ด๋ณผ ์ˆ˜ ์žˆ์Œ!

 

โ‘  Average

User-based์™€ ์ •ํ™•ํžˆ ๋ฐ˜๋Œ€ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌํ•จ!

User(i)๊ฐ€ ํ‰๊ฐ€ํ•œ ๋‹ค๋ฅธ ์•„์ดํ…œ๋“ค์˜ ํ‰๊ท ์œผ๋กœ ๊ตฌํ•ด์ง€๋ฉฐ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

โ‘ก Weighted Average

๋‹ค๋ฅธ ์˜ํ™”์™€์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•œ ํ›„์— ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ๊ฐ€์ค‘์น˜๋กœ ๊ณ„์‚ฐํ•˜์—ฌ ํ•˜๋Š” ๊ฒƒ์œผ๋กœ

 

์˜ˆ๋ฅผ๋“ค์–ด k=2์ผ๋•Œ,

์Šคํƒ€์›Œ์ฆˆ์™€ ์•„์ด์–ธ๋งจ์˜ ์œ ์‚ฌ๋„(0.7)

์Šคํƒ€์›Œ์ฆˆ์™€ ํ—ํฌ์˜ ์œ ์‚ฌ๋„(0.9)๋ผ๊ณ  ํ•˜๋ฉด

 

User B์˜ ์Šคํƒ€์›Œ์ฆˆ์— ๋Œ€ํ•œ ์˜ˆ์ธก ํ‰์ ์€

๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋œ๋‹ค.

 

 

* ์ถ”๊ฐ€์ ์œผ๋กœ ์•Œ์•„ ๋†“์„๊ฒƒ..!

Item-based ๋ฐฉ๋ฒ• ์—ญ์‹œ k๋ฅผ ์ง€์ •ํ•˜์—ฌ ๊ณ„์‚ฐ์„ ์ค„์ด๊ณ  ์„ฑ๋Šฅ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค!

K๊ฐ’์€ ์ง์ ‘ ํŠœ๋‹ํ•ด์•ผํ•˜๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ’

 


#  User-based VS Item-based

- ๋‹จ, ์œ ์ €๊ฐ€ ๋งŽ๊ณ  ์•„์ดํ…œ์€ ์ ๋‹ค๋Š” ์ƒํ™ฉ์„ ๊ฐ€์ •

- ํ•ญ์ƒ ๋งž๋Š”๊ฒƒ์€ ์•„๋‹ˆ๋‹ˆ ์ฐธ๊ณ ๋งŒ ํ•  ๊ฒƒ!

ํŠน์ง• User-based Item-based
1 ๊ตฌํ˜„์ด ์‰ฝ๊ณ , ์œ ์‚ฌํ•œ K์˜ ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚  ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๋†’์•„์ง ๋ณดํ†ต User-based๋ณด๋‹ค ๋†’์€ ์„ฑ๋Šฅ์„ ๋ƒ„
2 Item-based๋ณด๋‹ค ๋” ๋‹ค์–‘ํ•œ ์ถ”์ฒœ ๊ฒฐ๊ณผ ์ œ๊ณต ์•„์ดํ…œ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ์ข‹๋‹ค
3 Cold start์— ์ทจ์•ฝ(์‚ฌ์šฉ์ž ๋ฐ์ดํ„ฐ๊ฐ€ ๋ณ„๋กœ ์—†์Œ) ์ถ”์ฒœ์— ๋Œ€ํ•œ ์ด์œ ๋ฅผ ์„ค๋ช…ํ•˜๊ธฐ ์‰ฌ์›€
์˜ˆ ) ๋„ˆ๊ฐ€ ์ „์— ์•„์ด์–ธ๋งจ์„ ์ข‹๊ฒŒ๋ดค์œผ๋‹ˆ๊นŒ ๋น„์Šทํ•œ ์Šคํƒ€์›Œ์ฆˆ๋ฅผ ์ถ”์ฒœํ•˜๊ฒ ์–ด~
4 ํ”ผ์–ด์Šจ ์œ ์‚ฌ๋„๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ์„ฑ๋Šฅ ๋†’์Œ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ์„ฑ๋Šฅ ๋†’์Œ

 

#  CF (Collaborative Filtering์˜ ํ•œ๊ณ„)

1. Cold Start ๋ฌธ์ œ

: CB(Content-based Recommendation)์€ ๊ฐ•ํ•˜์ง€๋งŒ, CF(Collaborative Filtering)๋Š” ์•ฝํ•จ

๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š๋‹ค๋ฉด ์ถ”์ฒœ ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง

๋ฐ์ดํ„ฐ๊ฐ€ ์ „ํ˜€ ์—†๋Š” ์‹ ๊ทœ ์œ ์ €, ์•„์ดํ…œ์˜ ๊ฒฝ์šฐ  ์ถ”์ฒœ์ด ๋ถˆ๊ฐ€๋Šฅ ํ•˜๋‹ค๋Š” ๋ฌธ์ œ

 

2. ๊ณ„์‚ฐ ํšจ์œจ

: ์œ ์ €์™€ ์•„์ดํ…œ์ด ๋Š˜์–ด๋‚ ์ˆ˜๋ก ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ์ด ๋Š˜์–ด๋‚จ

์œ ์ €, ์•„์ดํ…œ์ด ๋งŽ์•„์•ผ ์ •ํ™•ํ•œ ์˜ˆ์ธก์„ ํ•˜์ง€๋งŒ ๋ฐ˜๋Œ€๋กœ ์‹œ๊ฐ„์ด ์˜ค๋ž˜๊ฑธ๋ฆฐ๋‹ค๋Š” ๋ฌธ์ œ์ .

 

3. Long-tail ์ถ”์ฒœ์˜ ํ•œ๊ณ„

: ๋งŽ์€ ์œ ์ €๋“ค์ด ์„ ํ˜ธํ•˜๋Š” ์†Œ์ˆ˜์˜ ์•„์ดํ…œ์ด ๋ณดํ†ต CF ์ถ”์ฒœ ๊ฒฐ๊ณผ๋กœ ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋œ๋‹ค.๋กฑํ…Œ์ผ์„ ์ด๋ฃจ๋Š” ํ•œ๋งˆ๋””๋กœ ๋ณ„๋กœ ์œ ๋ช…ํ•˜์ง€ ์•Š์€?์•„์ดํ…œ์— ๋Œ€ํ•ด์„œ๋Š” ์ถ”์ฒœ๋˜๊ธฐ ์–ด๋ ค์›€!

 

-> TopK์ถ”์ฒœ์€ ์•„๋‹ˆ์ง€๋งŒ ์–ด๋А์ •๋„ ์œ ์ €๊ฐ€ ์†Œ๋น„ํ•ด์•ผ ๋ฐ์ดํ„ฐ๊ฐ€ ์Œ“์ด๊ณ  ์ถ”์ฒœ์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์— 

Long-tail ์ถ”์ฒœ์˜ ํ•œ๊ณ„๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ

 


์˜ค๋Š˜์€ CF(Collaborative Filtering)์˜ ์ด๋ชจ์ €๋ชจ์— ๋Œ€ํ•ด ๊ธ€์„ ์จ๋ณด์•˜๋Š”๋ฐ์š”!

๊ธ€์„ ์ •๋ฆฌํ•˜๋ฉด์„œ๋„ ์ดํ•ด๊ฐ€ ๋˜์ง€ ์•Š๋Š” ๋ถ€๋ถ„์ด ์žˆ์–ด..

๋‹ค์‹œํ•œ๋ฒˆ ๋ณด๋ฉด์„œ ๊ณต๋ถ€ํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค!

 

๋งŽ์€ ๋„์›€์ด ๋˜์…จ๊ธธ ๋ฐ”๋ผ๋ฉฐ

๊ทธ๋Ÿผ ์˜ค๋Š˜๋„ ์ข‹์€ํ•˜๋ฃจ ๋ณด๋‚ด์„ธ์š”:)

 

๋!

+ ์ดํ•ด๋ฅผ ๋•๊ธฐ ์œ„ํ•œ ๊ทธ๋ฆผ..

728x90