FREE SHIPING FOR OVER $100 - MOSTLY SHIP VIA USPS GROUND ADVANTAGE %D days %H:%M:%S
이창환
1부강화학습CHAPTER01강화학습의개요1.1인공지능과기계학습1.2기계학습의방법들1.3강화학습의소개1.4강화학습의응용분야1.5요약CHAPTER02마르코프결정프로세스2.1마르코프모델2.2마르코프보상프로세스(MRP)2.3마르코프결정프로세스(MDP)2.4최적가치값과최적정책2.5부분관측마르코프결정프로세스CHAPTER03동적프로그래밍3.1동적프로그래밍의조건3.2정책평가3.3컨트롤3.5가치값반복3.6일반화된정책반복3.7요약CHAPTER04모델프리정책평가4.1모델프리환경4.2몬테카를로정책평가방법4.3TD학습4.4몬테카를로와TD의배치학습4.5TD(n)학습4.6TD(λ)학습4.7요약CHAPTER05모델프리컨트롤5.1몬테카를로일반화된정책반복5.2ε-탐욕방법정책향상5.3TD학습5.4Sarsa방법5.5Sarsa(λ)학습5.6오프정책5.7Q학습5.8더블Q학습5.9요약2부심층강화학습CHAPTER06가치값근사함수6.1가치값표시방법6.2가치값근사함수방법6.3점진적하강방법6.4목적값이주어진가치값근사함수학습6.5몬테카를로근사함수방법6.6TD학습근사함수방법6.7TD(l)근사함수방법6.8적정성추적6.9모델프리환경의가치값근사함수6.10요약CHAPTER07심층신경망과최적화학습7.1인공신경망7.2신경망의학습방법7.3심층신경망7.4심층신경망의종류7.5요약CHAPTER08심층Q네트워크8.1심층강화학습8.2심층Q네트워크8.3Atari게임에서의DQN8.4더블DQN8.5듀얼DQN8.6순환DQN8.7요약CHAPTER09정책그레디언트9.1정책기반강화학습9.2정책네트워크9.3정책목적함수9.4정책최적화9.5정책그레디언트정리9.6REINFORCE알고리즘9.7액터-크리틱방법9.8GAE9.9요약CHAPTER10고급정책그레디언트10.1A3C28210.2최대엔트로피강화학습29110.3TRPO29410.4PPO31210.5DDPG31510.6TD332210.7요약326CHAPTER11모방학습11.1보상값의예측11.2행동복제11.3DAGGER11.4역강화학습11.5속성매칭11.6도제학습11.7GAIL11.8요약CHAPTER12새로운강화학습12.1다중에이전트강화학습12.2계층강화학습