Open in app

Sign in

Write

Sign in

BAHADIR ARABACI
BAHADIR ARABACI

7 Followers

Home

About

Jun 9

Reinforcement Learning — Mini Glossary — EN/TR

Thanks to HuggingFace for awesome Deep RL course https://huggingface.co/learn/deep-rl-course/unit0/introduction?fw=pt Agent (Ajan): An agent acquires decision-making skills through trial and error, guided by rewards and punishments from its surroundings. Ajan: Bir ajan, çevresinden gelen ödüller ve cezalarla yönlendirilerek deneme yanılma yoluyla karar verme yetenekleri kazanır. Environment (Çevre): An environment represents a…

Machine Learning

5 min read

Machine Learning

5 min read


May 31

The “Deep” in Reinforcement Learning

Deep Reinforcement Learning incorporates deep neural networks into the framework of Reinforcement Learning. This integration allows for more complex and high-dimensional representations of states and actions, enabling the agent to learn and make decisions in environments with large state spaces. In traditional Reinforcement Learning approaches, such as Q-Learning, a tabular…

Machine Learning

2 min read

The “Deep” in Reinforcement Learning
The “Deep” in Reinforcement Learning
Machine Learning

2 min read


May 31

Two main approaches for solving RL problems: Policy-Based Methods/Value-Based Methods

Policy-Based Methods In policy-based methods, the focus is on directly learning the policy function, which maps states to the best corresponding actions. The policy can be deterministic, meaning it always selects the same action for a given state, or stochastic, where it outputs a probability distribution over the set of actions for…

Reinforcement Learning

2 min read

Two main approaches for solving RL problems: Policy-Based Methods/Value-Based Methods
Two main approaches for solving RL problems: Policy-Based Methods/Value-Based Methods
Reinforcement Learning

2 min read


May 31

The Exploration/Exploitation trade-off

The exploration/exploitation trade-off is a fundamental concept in reinforcement learning that refers to the dilemma of choosing between exploring new possibilities and exploiting known knowledge to maximize rewards. Exploration involves taking actions that are uncertain or unknown to gather more information about the environment. It allows the agent to discover…

Reinforcement Learning

2 min read

The Exploration/Exploitation trade-off
The Exploration/Exploitation trade-off
Reinforcement Learning

2 min read


May 31

Two Types of Tasks: Episodic and Continuing

Episodic Task In an episodic task, the problem has a well-defined starting point and an ending point, also known as a terminal state. Each episode consists of a sequence of states, actions, rewards, and new states that occur from the start to the end. The agent’s goal is to maximize the cumulative…

Reinforcement Learning

1 min read

Reinforcement Learning

1 min read


May 31

Identifying reward functions and the concept of discounted rewards

In reinforcement learning (RL), the reward serves as the fundamental feedback for the agent’s actions. It is through the reward that the agent learns whether the action taken was good or not, as the reward indicates the desirability of the outcome. Positive rewards reinforce actions that lead to favorable outcomes…

Machine Learning

2 min read

Identifying reward functions and the concept of discounted rewards
Identifying reward functions and the concept of discounted rewards
Machine Learning

2 min read


May 31

Observations/States Space

Observations: Observations refer to the information that an agent receives from the environment. In the context of reinforcement learning, observations are typically used to capture the current state or partial information about the state of the environment. Observations can take various forms depending on the specific problem or domain. For…

Machine Learning

2 min read

Observations/States Space
Observations/States Space
Machine Learning

2 min read


May 31

Markov Property

The Markov Property in Markov Decision Processes (MDPs) is a fundamental concept that significantly impacts the agent’s decision-making process. This property states that the agent’s decisions depend solely on the current state, disregarding the entire history of past states and actions. …

Markov Chains

3 min read

Markov Property
Markov Property
Markov Chains

3 min read


May 31

The reward hypothesis

In reinforcement learning, the learning process typically follows a loop that generates a sequence of state-action-reward-next state tuples. This sequence is often referred to as an “experience replay” or “trajectory.” In reinforcement learning, the agent’s ultimate goal is to maximize its cumulative reward, which is often referred to as the…

Machine Learning

2 min read

The reward hypothesis
The reward hypothesis
Machine Learning

2 min read


May 31

How does Reinforcement Learning work?

The agent receives the initial state, denoted as S₀, from the environment. In this case, the state represents the first frame of a game. Based on the received state S₀, the agent takes an action A₀. In this example, the agent decides to move to the right. The environment reacts…

Reinforcement Learning

2 min read

How does Reinforcement Learning work?
How does Reinforcement Learning work?
Reinforcement Learning

2 min read

BAHADIR ARABACI

BAHADIR ARABACI

7 Followers

https://github.com/arabacibahadir

Help

Status

About

Careers

Blog

Privacy

Terms

Text to speech

Teams