๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
AI

[PyTorch] Transformer ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ #1

by jjjaeunn 2023. 10. 30.

Hugging Face ๐Ÿค— ์‚ฌ์—์„œ ์ œ๊ณตํ•˜๋Š” Transformers Course๋ฅผ ํ•œ๊ตญ์–ด ๋ฒ„์ „์œผ๋กœ ํ˜ผ์ž ๊ณต๋ถ€ํ•˜๋ฉฐ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.


  • Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ์ฒซ ๋ฒˆ์งธ ๋„๊ตฌ, pipeline() ํ•จ์ˆ˜ ์ด์šฉํ•˜๊ธฐ
    • ํŒŒ์ดํ”„๋ผ์ธ ํ•จ์ˆ˜์— ํ…์ŠคํŠธ๊ฐ€ ์ž…๋ ฅ๋˜๋ฉด, ์ฃผ์š” 3๊ฐ€์ง€ ๋‹จ๊ณ„๊ฐ€ ๋‚ด๋ถ€์ ์œผ๋กœ ์‹คํ–‰๋œ๋‹ค.
    1. preprocessing
    2. ์ž…๋ ฅ ํ…์ŠคํŠธ ๋ชจ๋ธ์— ์ „๋‹ฌ
    3. postprocessing
  • zero shot classification ํŒŒ์ดํ”„๋ผ์ธ์—์„œ๋Š” ๊ธฐ์กด์— ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋ ˆ์ด๋ธ”์ด ์•„๋‹Œ ์ƒˆ๋กœ์šด ๋ ˆ์ด๋ธ” ์ง‘ํ•ฉ์„ ์‚ฌ์šฉํ•ด์„œ ํ…์ŠคํŠธ๋ฅผ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” classifier์ด๋‹ค. pipeline์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ œ๊ณตํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)
  • ํŒŒ์ดํ”„๋ผ์ธ์—์„œ๋Š” default model๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์›ํ•˜๋Š” ๋ชจ๋ธ์„ ์„ ํƒํ•˜์—ฌ ํŠน์ •ํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ์—์„œ๋Š” pipeline์—์„œ distilgpt2 ๋ชจ๋ธ์„ ์„ ํƒํ•˜์—ฌ ํ•ด๋‹น generator๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 
  • ์ด ๋•Œ generator๋Š” ๋ฏธ์™„์„ฑ๋œ ํ…์ŠคํŠธ ์ƒ์„ฑ๊ธฐ๋กœ ์‚ฌ์šฉ๋˜๋ฉฐ, ๋””์ฝ”๋” ๋ชจ๋ธ์ธ gpt2๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. 
from transformers import pipeline 
generator = pipeline("text-generation", model="distilgpt2") 
# distilgpt2 ๋ชจ๋ธ์„ ๋กœ๋“œํ•œ๋‹ค. 
generator( "In this course, we will teach you how to", max_length=30, num_return_sequences=2, )

** ๋‹จ์ผํ•œ ๋ฌธ์žฅ ๊ฐ์ •๋ถ„์„ํ•˜๊ธฐ

** ๋‹ค์ค‘ ๋ฌธ์žฅ ๊ฐ์ •๋ถ„์„ํ•˜๊ธฐ

** ํŠน์ •ํ•œ ๋ชจ๋ธ(roberta) ๊ฐ€์ ธ์™€์„œ ๊ฐ์ •๋ถ„์„ํ•˜๊ธฐ 


Transformer๋Š” ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉ๋˜๋Š”๊ฐ€? 

  • ๋Œ€๋ถ€๋ถ„์˜ Transformer ๋ชจ๋ธ์€ ์ž๊ฐ€์ง€๋„(self-supervised) ํ•™์Šต ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต๋˜์—ˆ๋‹ค. ์ฆ‰, ์‚ฌ๋žŒ์ด ์ง์ ‘ ๋ฐ์ดํ„ฐ์— ๋ ˆ์ด๋ธ”์„ ์ง€์ •ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค!
  • ์ „์ด ํ•™์Šต(Transfer Learning)์ด๋ž€ ์‚ฌ์ „ ํ•™์Šต์ด ์ˆ˜ํ–‰๋œ ํ›„์— fine-tuning์„ ์ง„ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค. → ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ๋ฐ์ดํ„ฐ๋กœ pretrain๋œ ๋ชจ๋ธ์— ํŒŒ์ธํŠœ๋‹์„ ํ†ตํ•ด์„œ ์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•˜๋Š” ํƒœ์Šคํฌ๋ฅผ ์ ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค๋Š” ์žฅ์ 
  • Attention layers
    • ์–ดํ…์…˜ ๋ ˆ์ด์–ด(attention layers)๋ผ๋Š” ํŠน์ˆ˜ ๋ ˆ์ด์–ด๋ฅผ ํ†ตํ•ด ๋‹จ์–ด์˜ ํ‘œํ˜„์„ ์ฒ˜๋ฆฌํ•  ๋•Œ, ๋ฌธ์žฅ์˜ ํŠน์ • ๋‹จ์–ด๋“ค์— ํŠน๋ณ„ํ•œ ์ฃผ์˜(attention)๋ฅผ ๊ธฐ์šธ์ด๊ณ  ๋‚˜๋จธ์ง€๋Š” ๊ฑฐ์˜ ๋ฌด์‹œํ•˜๋„๋ก ๋ชจ๋ธ์— ์ง€์‹œํ•˜๊ฒŒ ๋จ
    • ์–ดํ…์…˜ ๋งˆ์Šคํฌ(attention mask)๋Š” ๋ชจ๋ธ์ด ํŠน์ • ๋‹จ์–ด์— ์ฃผ์˜๋ฅผ ์ง‘์ค‘ํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๋„๋ก ํ•จ
    • ์ธ์ฝ”๋” ๋ชจ๋ธ - ์ฃผ์–ด์ง„ ์ดˆ๊ธฐ ๋ฌธ์žฅ์„ ์†์ƒ(mask)์‹œํ‚ค๊ณ , ์†์ƒ์‹œํ‚จ ๋ฌธ์žฅ์„ ์›๋ž˜ ๋ฌธ์žฅ์œผ๋กœ ๋ณต์›ํ•˜๋Š” ๊ณผ์ •์„ ํ†ตํ•ด์„œ ๋ชจ๋ธ ํ•™์Šต์ด ์ง„ํ–‰๋จ.
    • ๋””์ฝ”๋” ๋ชจ๋ธ - ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฌธ์žฅ์˜ ๋‹ค์Œ ๋‹จ์–ด ์˜ˆ์ธก ์ˆ˜ํ–‰์œผ๋กœ ์ด๋ฃจ์–ด์ง€๊ณ , ํ…์ŠคํŠธ ์ƒ์„ฑ๊ณผ ๊ด€๋ จ๋œ ์ž‘์—…์— ์ ํ•ฉํ•จ (GPT)๋Œ€์ƒ ์ž‘์—…์˜ ์ข…๋ฅ˜์— ๋”ฐ๋ผ Transformer์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋ณ€๊ฒฝํ•˜์—ฌ attention layer๋ฅผ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.