K-NN Model & Data Preprocessing

Posted Jul 29, 2024 Updated Aug 1, 2024

By SangYubLee

3 min read

Chapter 1. 나의 첫 머신러닝

인공지능이란?

사람처럼 학습하고 추론할 수 있는 지능을 가진 컴퓨터 시스템을 만드는 기술.

머신러닝이란?

규칙을 일일이 프로그래밍하지 않아도 자동으로 데이터에서 규칙을 학습하는 알고리즘을 연구하는 분야.
머신러닝의 대표적인 라이브러리는 사이킷런(scikit-learn)

딥러닝이란?

머신러닝 알고리즘 중에 인공신경망을 기반으로 한 방법들을 통칭하여 딥러닝이라고 부른다.
텐스플로우(구글), 파이토치(페이스북)의 오픈소스가 있음

Chapter 01-2 코랩과 주피터 노트북

코랩은 웹 브라우저에서 텍스트와 프로그램 코드를 자유롭게 작성할 수 있는 온라인 에디터라고 생각하면 된다.

텍스트 셀에 사용할 수 있는 마크다운

Chapter 01-3 마켓과 머신러닝

생선 분류 문제

도미 데이터

  
 bream_length = [25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7, 31.0, 31.0, 31.5, 32.0, 32.0, 32.0, 33.0, 33.0, 33.5, 33.5, 34.0, 34.0, 34.5, 35.0, 35.0, 35.0, 35.0, 36.0, 36.0, 37.0, 38.5, 38.5, 39.5, 41.0, 41.0]
 bream_weight = [242.0, 290.0, 340.0, 363.0, 430.0, 450.0, 500.0, 390.0, 450.0, 500.0, 475.0, 500.0, 500.0, 340.0, 600.0, 600.0, 700.0, 700.0, 610.0, 650.0, 575.0, 685.0, 620.0, 680.0, 700.0, 725.0, 720.0, 714.0, 850.0, 1000.0, 920.0, 955.0, 925.0, 975.0, 950.0]

데이터 산점도를 통해 선형성 확인

  
 import matplotlib.pyplot as plt
 plt.scatter(bream_length, bream_weight)
 plt.xlabel('length')
 plt.ylabel('weight')
 plt.show()

시각화 결과

도미, 빙어 같이 산점도 찍기

  
 plt.scatter(bream_length, bream_weight)
 plt.scatter(smelt_length, smelt_weight)
 plt.xlabel('length')
 plt.ylabel('weight')
 plt.show()

K-NN 분석 (k-최근접 이웃)

  
 ### 데이터 전처리 ###
 length = bream_length + smelt_length 
 weight = bream_weight + smelt_weight 
 fish_data = [[l, w] for l, w in zip(length, weight)]
 # 리스트 병합 후 세로 방향으로 2차원 리스트 만들기
 fish_target = [1] * 35 + [0] * 14  # 정답(라벨링) 리스트 

 #### 모델 불러오기 #### 
 from sklearn.neighbors import KNeighborsClassifier
 model = KNeighborsClassifier() # default n_neighbors=5
 model.fit(fish_data, fish_target) # 타겟변수(종속변수)를 통한 알고리즘 훈련 
 model.score(fish_data, fish_target) # 모델 평가 점수(정확도)
 model.predict([[30, 600]]) # 테스트 데이터 넣어보기 

 kn49 = KNeighborsClassifier(n_neighbors=49) # 참고 데이터를 49개로 한 kn49 모델
 kn49.fit(fish_data, fish_target)
 kn49.score(fish_data, fish_target) # 0.71428

K-NN

This post is licensed under CC BY 4.0 by the author.