λ³Έλ¬Έ λ°”λ‘œκ°€κΈ°

곡곡데이터 API

μ„œμšΈμ‹œ 뢀동산 μ „μ›”μ„Έκ°€ 데이터 μˆ˜μ§‘ (곡곡데이터 API ν™œμš©)

λ°˜μ‘ν˜•

β€» μ„œμšΈ 열린데이터 κ΄‘μž₯ ν™ˆνŽ˜μ΄μ§€(https://data.seoul.go.kr/) λ₯Ό 톡해 μ‹€μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€

 

βœ‹ μ„œμšΈ 열린데이터 κ΄‘μž₯ - API λ°œκΈ‰ 및 데이터 μˆ˜μ§‘

🎈 톡합검색 - 뢀동산 μ „μ›”μ„Έκ°€

🎈 Open API 

🎈 μΈμ¦ν‚€ μ‹ μ²­

🎈 μΈμ¦ν‚€ λ°œκΈ‰

πŸ‘‰  λ°œκΈ‰λœ 인증킀 λ₯Ό 볡사 ν›„  api  에 ν™œμš©ν•  수 μžˆλ‹€

 

🎈 μƒ˜ν”Œ URL 양식 / μš”μ²­μΈμž / 좜λ ₯κ°’ 확인

πŸ‘‰  Open API  호좜 ν›„ λ‚˜μ˜€λŠ” 좜λ ₯값을 확인할 수 μžˆλ‹€.

πŸ‘‰  List_total_count  κ°€ 1,000이 λ„˜μ„ 경우, Open APIλŠ” 1νšŒμ— 1,000건을 λ„˜μ„ 수 μ—†μœΌλ―€λ‘œ λΆ„λ¦¬ν•΄μ„œ 호좜 (반볡문 ν•„μš”!!)

 

🎈 μ›ν•˜λŠ” 쑰건에 맞좰 μƒ˜ν”Œ ν…ŒμŠ€νŠΈ

🎈 양식에 λ§žμΆ°μ„œ μž…λ ₯ν•΄μ•Ό μ›ν•˜λŠ” 정보가 λ‚˜μ˜΄

πŸ‘‰ μ‚¬μš©ν•˜κ³ μž ν•˜λŠ” 언어에 맞게 κ°€μ΄λ“œλ₯Ό λ‹€μš΄λ‘œλ“œ λ°›λŠ”λ‹€

 

🎈 Data ν˜•νƒœ - xml / json

πŸ‘‰ ν˜•νƒœλŠ”  xml / json  쀑에 μ„ νƒν•œ ν›„, μƒ˜ν”Œ URL ν˜•μ‹μ— 맞게 μž‘μ„±ν•œλ‹€

πŸ‘‰ 이후,  Google Colab  에 μ μš©ν•˜μ—¬ api 크둀링을 μ§„ν–‰ν•œλ‹€

 

🎈 λ°μ΄ν„° ν™œμš© λ°©μ•ˆ

 Data λ³€ν™˜ (xml, json)  πŸ‘‰  pandas의 dataframe ν˜•νƒœλ‘œ λ³€ν™˜  πŸ‘‰  csv 파일 λ§Œλ“€κΈ°  πŸ‘‰ DB(database) 에 μ €μž₯ 

 

 

βœ‹ Google Colab 에 URL μ μš©ν•˜κΈ°

🎈 μ„œμšΈμ‹œ 뢀동산 XML API 예제

 

πŸ”‘ 라이브러리 뢈러였기

# library
import lxml
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import math

service_key = 'λ°œκΈ‰λ°›μ€ 인증 ν‚€'
url = f'http://openapi.seoul.go.kr:8088/{service_key}/xml/tbLnOpendataRtmsV/1/5/'
print(url)

πŸ‘‰  μΈμ¦ ν‚€ 관리 μ£Όμ˜ν•˜κΈ°!  (개발 μ‹œ λ…ΈμΆœν•˜μ§€ 말 것)

 

πŸ”‘ API μš”μ²­ ν™•μΈν•˜κΈ°

response = requests.get(url)
print(response.content)

πŸ”‘ XML ν˜•νƒœλ‘œ λ³€ν™˜ (Parsing)

soup = BeautifulSoup(response.content, "lxml")  # XML Parsing
print(soup)

πŸ”‘ XML ν˜•νƒœλ₯Ό Pandas DataFrame으둜 λ³€ν™˜

# 좜λ ₯κ°’ λ³€μˆ˜ μ„€μ •
years            = soup.find_all('acc_year')         # μ ‘μˆ˜λ…„μ›”
sgg_cds          = soup.find_all('sgg_cd')           # μžμΉ˜κ΅¬μ½”λ“œ
sgg_nms          = soup.find_all('sgg_nm')           # 자치ꡬλͺ…
bjdong_cds       = soup.find_all('bjdong_cd')        # λ²•μ •λ™μ½”λ“œ
bjdong_nms       = soup.find_all('bjdong_nm')        # 법정동λͺ…
land_gbns        = soup.find_all('land_gbn')         # μ§€λ²ˆκ΅¬λΆ„
land_gbn_nms     = soup.find_all('land_gbn_nm')      # μ§€λ²ˆκ΅¬λΆ„λͺ…
land_gbn_nms     = soup.find_all('land_gbn_nm')      # μ§€λ²ˆκ΅¬λΆ„λͺ…
bonbeons         = soup.find_all('bonbeon')          # 본번
bubeons          = soup.find_all('bubeon')           # λΆ€λ²ˆ
bldg_nms         = soup.find_all('bldg_nm')          # 건물λͺ…
deal_ymds        = soup.find_all('deal_ymd')         # 계약일
obj_amts         = soup.find_all('obj_amt')          # λ¬Όκ±΄κΈˆμ•‘(λ§Œμ›)
bldg_areas       = soup.find_all('bldg_area')        # 건물면적(㎑)
tot_areas        = soup.find_all('tot_area')         # 토지면적(㎑)
floors           = soup.find_all('floor')            # μΈ΅
right_gbns       = soup.find_all('right_gbn')        # κΆŒλ¦¬κ΅¬λΆ„
cntl_ymds        = soup.find_all('cntl_ymd')         # μ·¨μ†ŒμΌ
build_years      = soup.find_all('build_years')      # 건좕년도
house_types      = soup.find_all('house_type')       # κ±΄λ¬Όμš©λ„
req_gbn          = soup.find_all('req_gbn')          # 신고ꡬ뢄
rdealer_lawdnms  = soup.find_all('rdealer_lawdnm')   # μ‹ κ³ ν•œ κ°œμ—…κ³΅μΈμ€‘κ°œμ‚¬ μ‹œκ΅°κ΅¬λͺ…
# 반볡문 ν™œμš©
year_list           = []
sgg_cd_list         = []
bldg_nm_list        = []
obj_amt_list        = []
house_type_list     = []
rdealer_lawdnm_list = []

for year, sgg_cd, bldg_nm, obj_amt, house_type, rdealer_lawdnm in zip(years, sgg_cds, bldg_nms, obj_amts, house_types, rdealer_lawdnms):
  year_list.append(year.get_text())
  sgg_cd_list.append(sgg_cd.get_text())
  bldg_nm_list.append(bldg_nm.get_text())
  obj_amt_list.append(obj_amt.get_text())
  house_type_list.append(house_type.get_text())
  rdealer_lawdnm_list.append(rdealer_lawdnm.get_text())

df = pd.DataFrame({
    "acc_year": year_list, 
    "sgg_cd": sgg_cd_list, 
    "bldg_nm" : bldg_nm_list,
    "obj_amt": obj_amt_list,
    "house_type" : house_type_list,
    "rdealer_lawdnm": rdealer_lawdnm_list
})

df

 

πŸ”‘  JSON  ν˜•νƒœλ₯Ό  Pandas DataFrame  μœΌλ‘œ λ³€ν™˜ (μΆ”μ²œ β˜…β˜…β˜…)

service_key = '인증받은 인증 ν‚€'
url = f'http://openapi.seoul.go.kr:8088/{service_key}/json/tbLnOpendataRtmsV/1/5/'
print(url)
req = requests.get(url)
content = req.json()
print(content)
## key κ°’ 확인
content.keys()

# dict_keys(['tbLnOpendataRtmsV'])
# key κ°’ ν™œμš©ν•˜μ—¬ λ‚΄μš© 확인
content['tbLnOpendataRtmsV']

# row에 ν•΄λ‹Ήν•˜λŠ” λ‚΄μš© 확인
content['tbLnOpendataRtmsV']['row']

# pandas dataframe ν˜•νƒœλ‘œ λ§Œλ“€κΈ°
pd.DataFrame(content['tbLnOpendataRtmsV']['row'])

 

λ°˜μ‘ν˜•