Unit - 5
Reinforced and Deep Learning
You probably used the reinforcement learning method when you needed to figure out which situation requires an action or when you wanted to figure out which action delivers the most rewards over a lengthy period of time. Reinforcement learning is also important for obtaining the learning agent as well as the reward function, as well as estimating the best process or way to receive the highest reward.
Reinforcement learning demonstrates the following qualities as a result of the large number of specialties:
● There is no supervisor, simply a number or a signal of reward.
● Making decisions in a sequential order.
● In Reinforcement learning situations, time is key.
● Feedback is never instantaneous; it is always delayed.
● The data that the agent receives is determined by its actions.
A Reinforcement Learning algorithm can be implemented in three ways.
Value-Based:
You should strive to optimise a value function V in a value-based Reinforcement Learning approach (s). In this strategy, the agent anticipates a long-term return of the current policy states.
Policy-based:
In a policy-based RL method, you strive to come up with a policy that allows you to get the most reward in the future by doing actions in each state.
There are two sorts of policy-based methods:
● Deterministic - The policy produces the same action for each state.
● Stochastic - Every action has a probability, which can be calculated using the equation below.
Stochastic Policy
n{a\s) = P\A, = a\S, =S]
Model-Based:
You must develop a virtual model for each environment in this Reinforcement Learning approach. The agent learns how to perform in that particular setting.
Positive Reinforcement learning
Positive reinforcement is defined as the process of encouraging a desired behaviour by providing a reward as a stimulus when that behaviour is demonstrated. In other words, if an activity produces a good response or reward, the response is likely to be reinforced.
The goal is to enhance the likelihood of the pattern or behaviour occurring. Positive reinforcers can be a desirable outcome that satisfies requirements like as food, water, money, status, medals, and so on, or they might be a barrier that is removed in the need satisfaction process.
Example
● A commercial for a health drink that shows a rise in height as a result of drinking it is an example of Positive Reinforcement, in which the gain in height is the reinforcer to buy the health drink.
● The Internal Revenue Service is giving gifts to genuine assessees.
● Employees are granted promotions as a kind of reinforcement if they meet their sales targets.
One thing to keep in mind is that a good reinforcer might mean different things to different people. For example, a manager's appreciation can be a powerful reinforcer for some employees, but not for all, and a promotion or increment can be a powerful reinforcer for others.
Negative Reinforcement learning
When a reinforcer, i.e. stimulus, is eliminated as a result of a specific behaviour, this is referred to as 'negative reinforcement.' As a result of removing the negative effect, the frequency of occurrence of the behaviour will grow in the future. In a nutshell, negative reinforcement refers to the use of an aversive stimuli. Because the reinforcer is negative and the person tries to avoid or escape from it, it is also known as escape or avoidance learning.
It differs from punishment in that punishment reduces the likelihood of the behaviour occurring, whereas negative reinforcement enhances the likelihood of the behaviour occurring.
Example
● During the rainy season, a person learns to wear a raincoat to avoid getting wet.
● People wear helmets to avoid being hurt in a car accident or being ticketed by the cops.
Positive vs Negative Reinforcement learning
BASIS FOR COMPARISON | POSITIVE REINFORCEMENT | NEGATIVE REINFORCEMENT |
Meaning | Positive Reinforcement refers to the technique of introducing a stimulus in order to increase the likelihood of a pattern or behaviour repeating itself. | Negative reinforcement is when a negative stimulus is removed in order to encourage positive behaviour. |
Stimuli | Added | Removed |
Consequences | Stimuli have a positive impact. | Stimuli have unfavourable outcomes. |
Reinforcer acts as | Reward | Penalty |
Results in | Strengthening and maintaining responses | Avoiding and escaping responses. |
Key takeaway
Positive reinforcement is defined as the process of encouraging a desired behaviour by providing a reward as a stimulus when that behaviour is demonstrated.
When a reinforcer, i.e. stimulus, is eliminated as a result of a specific behaviour, this is referred to as 'negative reinforcement.'
In reinforcement learning, there are two main learning models:
● Markov Decision Process
● Q learning
Markov Decision Process
To formalise reinforcement learning problems, the Markov Decision Process (MDP) is used. If the environment is totally observable, a Markov Process can be used to model its dynamics. The agent in MDP constantly interacts with the environment and performs actions; the environment responds to each action and develops a new state.
To obtain a solution, the following parameters are used:
● Set of actions- A
● Set of states -S
● Reward- R
● Policy- n
● Value- V
The mathematical method for mapping a reinforcement solution Learning is renamed a Markov Decision Process or MDP for short (MDP).
Fig 1: Markov decision process
Markov Property:
"If the agent is in the current state S1, performs action a1, and then moves to the state s2, the state transition from s1 to s2 is only dependent on the current state and future action, and states are independent of past actions, rewards, or states," it says.
Or, to put it another way, the present state transition is independent of any previous action or state, according to the Markov Property. As a result, MDP is a Markov property-satisfying RL issue. In a game like Chess, the players are just concerned with the current state and do not need to recall previous actions or states.
Finite MDP:
There are finite states, finite rewards, and finite actions in a finite MDP. Only the finite MDP is considered in RL.
Markov Process:
The Markov Process is a memoryless process that employs the Markov Property and has a sequence of random states S1, S2,...., St. The Markov process, often known as the Markov chain, is a tuple (S, P) based on the state S and the transition function P. The system's dynamics can be defined by these two components (S and P).
State
A state is a collection of tokens that represent every possible state for the agent.
Model
A Model (also known as a Transition Model) describes how an action affects a state. T(S, a, S') specifies a transition T that occurs when we are in state S and do action 'a', resulting in state S' (S and S' may be the same). We also construct a probability P(S'|S,a) for stochastic actions (noisy, non-deterministic), which is the chance of reaching a state S' if action 'a' is conducted in state S. Note that the Markov property asserts that the results of an action made in a state are solely determined by that state and not by previous history.
Action
A course of action A is a collection of all actions that can be taken. The collection of actions that can be executed while in state S is defined by A(s).
Reward
A Reward is a reward function with a real value. The reward for simply being in the state S is R(s). The reward for being in state S and doing action 'a' is R(S,a). The reward for being in state S, taking action 'a', and ending up in state S' is R(S,a,S').
Q-Learning
It's a value-based model-free method of informing an agent about the action he or she should take. It is based on the concept of updating Q values, which demonstrates the value of performing action A in state S. The essential feature of the Q-learning algorithm is the value updating rule.
Fig 2: Q - learning
Given a current state, Q-Learning is a Reinforcement learning policy that will determine the next best action. It selects this action at random with the goal of maximising the reward.
Fig 3: Components of Q-Learning
Given the present state of the agent, Q-learning is a model-free, off-policy reinforcement learning that will determine the best course of action. The agent will pick what action to take next based on where it is in the environment.
The model's goal is to determine the optimum course of action given the current situation. To accomplish this, it may devise its own set of rules or act outside of the policy that has been established for it to obey. This means there isn't a real need for a policy, which is why it's called off-policy.
Model-free indicates that the agent moves forward based on estimates of the environment's expected response. It does not learn through a reward system, but rather by trial and error.
A recommendation system for advertisements is an example of Q-learning. Ads you see in a traditional ad suggestion system are based on previous purchases or websites you've visited. If you've purchased a television, you'll be given recommendations for televisions from various companies.
Fig 4: Ad Recommendation System
Important Terms in Q-Learning
State - The state, S, represents an agent's current position in a given environment.
Action - The action, A, is the action that the agent takes when it is in a certain state.
Reward - The agent will receive a good or negative reward for each activity.
Episodes - When an agent becomes stuck in a terminating condition and is unable to do any further activity.
Q-Values - are used to determine the quality of an Action, A, taken at a specific state, S. Q: (A, S).
Temporal Difference - A formula for calculating the Q-Value by combining the values of the current and earlier states and actions.
Key takeaway
To formalise reinforcement learning problems, the Markov Decision Process (MDP) is used. If the environment is totally observable, a Markov Process can be used to model its dynamics.
It is based on the concept of updating Q values, which demonstrates the value of performing action A in state S.
Deep learning is based on the branch of machine learning, which is a subset of artificial intelligence. Since neural networks imitate the human brain and so deep learning will do. In deep learning, nothing is programmed explicitly. Basically, it is a machine learning class that makes use of numerous nonlinear processing units so as to perform feature extraction as well as transformation. The output from each preceding layer is taken as input by each one of the successive layers.
Deep learning models are capable enough to focus on the accurate features themselves by requiring a little guidance from the programmer and are very helpful in solving out the problem of dimensionality. Deep learning algorithms are used, especially when we have a huge no of inputs and outputs.
Since deep learning has been evolved by the machine learning, which itself is a subset of artificial intelligence and as the idea behind the artificial intelligence is to mimic the human behavior, so same is "the idea of deep learning to build such algorithm that can mimic the brain".
Deep learning is implemented with the help of Neural Networks, and the idea behind the motivation of Neural network is the biological neurons, which is nothing but a brain cell.
Deep learning is a collection of statistical techniques of machine learning for learning feature hierarchies that are actually based on artificial neural networks.
So basically, deep learning is implemented by the help of deep networks, which are nothing but neural networks with multiple hidden layers.
Characteristics
Deep Learning has the following characteristics:
1. Supervised, Semi-Supervised or Unsupervised
Supervised learning occurs when category labels are present when the data is being trained. Linear regression is an example of an algorithm. Supervised Learning is used in decision trees and logistic regression. Unsupervised learning occurs when category labels are unknown when training data. Unsupervised Learning is used in algorithms such as Cluster Analysis, K means clustering, and Anomaly detection. Semi-Supervised learning occurs when the data set contains both labelled and unlabeled data. Semi-Supervised learning is used in graph-based models, generative models, cluster assumptions, and continuity assumptions.
2. Huge Amount of Resources
Processing huge workloads necessitates sophisticated Graphical Processing Units. Big data, which can be structured or unstructured, requires the processing of a large volume of data. Depending on the amount of data sent in, more time may be necessary to process the data.
3. Large Amount of Layers in Model
A large number of layers, such as input, activation, and output, will be necessary; sometimes the output of one layer can be fed to another layer by making a few minor discoveries, which are then summed together in the softmax layer to determine a broader categorization for the final output.
4. Optimizing Hyper-parameters
Because it forms a relationship between layer predictions and final output prediction, hyper factors like the number of epochs, Batch size, No of layers, and Learning rate must be fine-tuned for effective Model accuracy. Hyper-parameters can effectively handle overfitting and under-fitting.
5. Cost Function
It expresses how well the model performs in terms of accuracy and prediction. The purpose of each iteration in a Deep Learning Model is to reduce the cost in comparison to previous iterations. According to the techniques utilised, mean absolute error, mean squared error, hinge loss, and cross entropy are various forms.
Biological neural networks establish the structure of the human brain, and the phrase "Artificial Neural Network" is taken from them. Artificial neural networks, like the human brain, have neurons that are coupled to one another in various layers of the networks. Nodes are the name for these neurons.
Fig 5: Neurons
The diagram seen here is a typical Biological Neural Network diagram.
The standard Artificial Neural Network resembles the illustration below.
Fig 6: ANN
In Artificial Neural Networks, dendrites from biological neural networks represent inputs, cell nuclei represent nodes, synapse represent weights, and axon represent output.
In the science of artificial intelligence, an Artificial Neural Network seeks to duplicate the network of neurons that make up a human brain so that computers can understand things and make judgments in a human-like fashion. Computers are programmed to act like interconnected brain cells in order to create an artificial neural network.
The human brain contains approximately 1000 billion neurons. Each neuron has a number of association points ranging from 1,000 to 100,000. Data is stored in the human brain in such a way that it may be spread, and we can pull multiple pieces of this data from our memory at the same time if needed. The human brain, we may say, is made up of incredible parallel processors.
Consider the example of a digital logic gate that receives an input and produces an output to better understand the artificial neural network. The "OR" gate accepts two inputs. If one or both inputs are "On," the output will be "On." If both inputs are "Off," the output will be "Off." In this case, the output is determined by the input. Our brains don't work in the same way. Because our brain's neurons are "learning," the output to input connections is constantly changing.
Architectures
To comprehend the concept of artificial neural network architecture, we must first comprehend what a neural network is made up of. To describe a neural network that is made up of a large number of artificial neurons called units that are stacked in layers. Let's take a look at the various layers that may be found in an artificial neural network.
The Artificial Neural Network is made up of three layers:
Fig 7: Architecture of ANN
- Input layer
It accepts inputs in a variety of formats specified by the programmer, as the name implies.
2. Hidden layer
Between the input and output layers is a concealed layer. It does all the math to uncover hidden features and patterns.
3. Output layer
The input goes through a series of transformations using the hidden layer, which finally results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias. This computation is represented in the form of a transfer function.
Advantages
● Parallel processing capability:
Artificial neural networks have a numerical value that allows them to accomplish multiple tasks at once.
● Storing data on the entire network:
Traditional programming data is saved across the entire network, not in a database. The network continues to function despite the loss of a few pieces of data in one location.
● Capability to work with incomplete knowledge:
Even with insufficient data, after ANN training, the information may produce output. The relevance of missing data is what causes the performance loss in this case.
● Having a memory distribution:
It is critical to determine the examples and to stimulate the network according to the intended output by displaying these examples to the network if the ANN is to be able to adapt.
● Having fault tolerance:
The loss of one or more ANN cells does not prevent the network from producing output, and this feature makes the network fault-tolerant.
Disadvantages
● Assurance of proper network structure:
The structure of artificial neural networks can be determined in a variety of ways. Experience, trial, and error are used to create the ideal network structure.
● Unrecognized behavior of the network:
It is ANN's most important problem. When ANN creates a testing solution, it does not explain why or how it was created. It erodes the network's trustworthiness.
● Hardware dependence:
Artificial neural networks, due to their structure, require computers with parallel processing power. As a result, the equipment's manifestation is contingent.
● Difficulty of showing the issue to the network:
ANNs are capable of working with numerical data. Before introducing problems to ANN, they must be transformed into numerical values. The presentation mechanism that must be resolved here will have a direct impact on the network's performance. It is dependent on the user's capabilities.
● The duration of the network is unknown:
The network is reduced to a specified error value, which does not provide us the best outcomes.
Key takeaway
Neural networks are mathematical models that use learning algorithms inspired by the brain to store information.
A major focus of machine-learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data.
Neural networks are a popular framework to perform machine learning, but there are many other machine-learning methods, such as logistic regression, and support vector machines.
Convolutional neural networks are composed of multiple layers of artificial neurons. Artificial neurons, a rough imitation of their biological counterparts, are mathematical functions that calculate the weighted sum of multiple inputs and outputs an activation value.
Convolutional neural networks are made up of two basic components: convolutional layers and pooling layers.
There are nearly limitless ways to assemble these layers for a particular computer vision issue, despite their simplicity.
Convolutional and pooling layers, for example, are rather simple to comprehend in a convolutional neural network.
The most difficult aspect of employing convolutional neural networks in reality is figuring out how to create model topologies that make the most of these basic components.
Fig 8: Convolutional Neural Networks
The behaviour of each neuron is defined by its weights. When fed with the pixel values, the artificial neurons of a CNN pick out various visual features.
When you input an image into a ConvNet, each of its layers generates several activation maps. Activation maps highlight the relevant features of the image. Each of the neurons takes a patch of pixels as input, multiplies their colour values by its weights, sums them up, and runs them through the activation function.
The first (or bottom) layer of the CNN usually detects basic features such as horizontal, vertical, and diagonal edges. The output of the first layer is fed as input of the next layer, which extracts more complex features, such as corners and combinations of edges. As you move deeper into the convolutional neural network, the layers start detecting higher-level features such as objects, faces, and more.
Key takeaway
Convolutional neural networks are made up of two basic components: convolutional layers and pooling layers.
Convolutional neural networks are composed of multiple layers of artificial neurons. Artificial neurons, a rough imitation of their biological counterparts, are mathematical functions that calculate the weighted sum of multiple inputs and output an activation value.
Manufacturing
Deep reinforcement learning is used by robots in several automobile manufacturing industries to pick goods and place them in containers.
Control
Reinforcement learning can be used for adaptive control in factory processes, communications admission control, and helicopter pilots, to name a few examples.
Robotics
Robot navigation, Robo-soccer, walking, juggling, and other applications require RL.
Game Playing
Tic-tac-toe, chess, and other games can be played with RL.
Chemistry
Chemical reactions can benefit from the usage of RL.
Business
RL is being utilised to design company strategy.
Application of deep learning
Self Driving Cars
Deep Learning is used in self-driving automobiles to analyse data gathered in cars built in various terrains such as mountains, deserts, and land. Data from sensors, public cameras, and other sources can be collected to aid in the testing and implementation of self-driving automobiles. The system must be capable of ensuring that all scenarios are handled correctly throughout training.
Healthcare
Deep Learning has played a significant part in everything from medical picture analysis to illness cures, especially when GPU-processors are used. It also assists physicians, clinicians, and doctors in assisting patients in escaping danger, as well as diagnosing and treating patients with appropriate medications.
Stock Analysis
Quantitative equity analysts are gaining more advantages, particularly in terms of determining whether a stock's trend will be bullish or bearish, and they can use many more factors when training the deep learning layers, such as the number of transactions made, the number of buyers, the number of sellers, the previous day's closing balance, and so on. When training the deep learning layers, qualitative equity analysts consider criteria such as return on equity, P/E ratio, Return on Asset, Dividend, Return on Capital Employed, Profit per Employee, Total Cash, and so on.
Fraud Detection
Hackers, particularly those operating on the dark web, have developed software that allows them to steal money digitally across the globe. Deep learning will learn to detect fraudulent transactions on the internet based on a variety of parameters such as router information, IP addresses, and so on. Financial firms can also save billions of dollars by using autoencoders. Finding outliers and researching them can also help detect these types of fraudulent transactions.
Image Recognition
If the city police department has a people database for the city and they want to know who is involved in crimes or violence in public gatherings, they can use public webcams to find out who is involved. Deep learning using CNN (Convolution Neural Networks) can help a lot in finding the person who was involved in the act.
News Analysis
The government is putting up a lot of effort these days, particularly in preventing the spread of fake news and determining its source. Also, during poll surveys such as who would win elections in terms of popularity, which candidate has been shared by the most people on social media, and analysis of tweets made by country people, we can predict the outcomes in deep learning using all of these variables. However, there are some limitations to it, such as not knowing whether the data is genuine or fake, or whether the necessary information has been spread by bots.
References:
- Parag Kulkarni and Prachi Joshi, “Artificial Intelligence – Building Intelligent Systems”, PHI learning Pvt. Ltd., ISBN – 978-81-203-5046-5, 2015
- Stuart Russell and Peter Norvig (1995), “Artificial Intelligence: A Modern Approach,” Third edition, Pearson, 2003.
- Kumar, Zindani, Davim, Artificial Intelligence in Mechanical and Industrial Engineering, CRC Press, 2021.
- Zsolt Nagy - Artificial Intelligence and Machine Learning Fundamentals - Apress (2018)