# PSYCH448D-07-Accessible.pdf

Andrea Stocco

University of Washington

Seattle, WA

PSYCH448D, Week 5Models of Memory /1

Summary of RL

RL is a general framework that explains learning

The V-values and Q-values are a form of memory

There are multiple memory systems in the brain

Different types of memory systems

Seger, 2005 (adapted from Squire & Zola-Morgan, 1992)

Different types of memory systems: RL

Seger, 2005 (adapted from Squire & Zola-Morgan, 1992)

Memory systems in the brain

Some questions to start the day

What is memory?

What are the characteristics of human memory?

What are the limitations of human memory?

What is memory?

Characteristic of memory

● It starts with an

event○ The hand press

● Event leaves a

trace○ The handprint

● Time will make the

trace fade away○ The wind

● More events will

make the trace

deeper

Rational Analysis (i.e., Bayesian framework)

● Influential branch of mathematical models

● Assumptions○ Assumes agents are adapted to their environment

○ Evolution has already done the job of optimizing the agent

● Consequence○ If you know the environment and the agent’s goals, you can

mathematically derive what the organism will do

● Advantage○ Specification of the agent is minimal

A timeline of Bayesian models of memory

John R.

Anderson

(1990s)

Nick Chater

(2000s)

Thomas

Griffiths

(2010s)

* Has since taken a

different approach

A rational approach to memory

● Memory has evolved to efficiently retrieve

information○ Suppose that retrieval of information took constant time T

○ Very frequent memories take the same time as very infrequent

ones

○ Then computational resources are not well allocated!

○ Useful only when T ~ 0!

● Cost-effective approach: for each memory trace

M, we can consider its need probability, p(M)

● Then, retrieval time should be T ~ 1/p(M)○ That is, resources spent to make M available should be

proportional to M’s usefulness.

A rational theory of memory retrieval

It is useful to think of rational analysis as an

economic theory

● Memories are useful to achieves goals

● Each goals has a value, V○ Like the sum of rewards Rt in RL

● But memory retrieval has a cost, C

● Thus net value of retrieving memory M is p(M) *

V – C

A rational theory of forgetting

● Forgetting is not bad

● When would it be good to forget something at all?

● Theory: retrieving memories has a cost C:○ Example costs: Attention, time to pause, etc.

● When p(M) * V < C, it is rational to forget M!

Interim summary

● Agent’s behavior can be analyzed as a function of

the environment

● We assume agent is optimal

● We assume that agent is rational (minimizes costs)

● Memory decay and forgetting can be seen as

rational

Anderson and Milson (1989)

● Laid out the principles of rational analysis

● Considered memory as an information retrieval

system○ That is, a database system

● The system retrieves records in response to

queries Q

● Applied Bayesian principles to derive optimal

behavior○ What an ideal system should do

● Bayesian approaches to cognition are now

ubiquitous

● Note that:

p(A & B) = p(A) ✕ p(B | A)

p(A & B) = p(B) ✕ p(A | B)

● We can establish equality

p(B) ✕ p(A | B) = p(A) ✕ p(B | A)

p(A | B) = [p(A) ✕ P(B | A)] / p(B)

Bayes theorem

p(A)p(B)

p(A & B)

p(A) p(B | A)

p(B)

p(A | B)

Bayes theorem: Nomenclature

p(A | B) =

p(A) ✕ p(B | A)

p(B)

Posterior probability Prior probability Likelihood

Marginal Likelihood(“model evidence”)

Bayes theorem: More nomenclature

Posterior probability of A

p(A | B) =

p(A) ✕ p(B | A)

p(B)

Prior probabilityof A

Likelihood of B

Marginal probability of B

(“model evidence”)

Prior probability of A

p(B | A)

p(B)

= p(A) ✕

Support for B

Bayes theorem: Meaning

p(A | B) =

p(A) ✕ p(B | A)

p(B)

Posterior probability

p(B | A)

p(B)

p(A) ✕

Prior probability

ContextualFactors

● Contextual factors = Q (for “query”, like in

database)

● The context Q is made of several cue elements qi

● Each cue q contributes to the probability of

needing A.

Memory as information system

p(A | Q) =

Posterior probability

p(Q | A)

p(Q)

p(A) ✕

Prior probability

ContextualFactors

● For ease, Anderson and Milson work in terms of

odds, not probabilities.

● The math changes a little

● Values range is now [0, ∞], instead of [0, 1]

From probabilities to odds

p(A | Q)

p(Q | A)p(Q|

¬A)

p(A) p(¬A)

p(¬A | Q)

=✕

● Priori probabilities reflect the history of A

independent of any contextual factors.

● How do we characterize the history of A?

Prior probabilities

p(A | Q)

p(Q | A)p(Q|

¬A)

p(A) p(¬A)

p(¬A | Q)

= ✕

Calculating priors

Factors that affect priors:

● The importance of a memory

● The time since a memory has been created○ The more time passes, the more we have evidence that an

event is unlikely

○ Example: waiting times at customer service

● Anderson uses Burrell’s (1980) model of library

book rentals:

● Where○ n is the number of times since a has been created.

○ t is the time since A’s creation

○ r( t) is decay over time, think of λ t with 0 < λ < 1 in RL

○ v and b are initial parameters (you can think of them as mean

uses and mean age of all memories)

Calculating priors: The book rental problem

p(A) p(¬A)

n + v t + b

r(t)

=

Contextual factors

p(A | Q)

p(Q | A)p(Q|

¬A)

p(A) p(¬A)

p(¬A | Q)

= *

● Contextual factors are broken down into individual

cues qi

Breaking down contextual factors

p(A | Q)

p(Q | A)p(Q|

¬A)

p(A) p(¬A)

p(¬A | Q)

= *

p(Q | A)p(Q|

¬A)

p(qi |

A)p(qi |

¬A)

=

Πi

● Contextual factors are broken down into individual

cues qi

● As the number of memories increases, p(qi | ¬A) ➡

p(qi )

Simplifying contextual factors

p(Q | A)p(Q|

¬A)

p(qi |

A)p(qi |

¬A)

=

Πi

p(Q | A)p(Q|

¬A)

p(qi |

A)p(qi

)

=

Πi

● Interpretation is simple: probability of finding qi

whenever A is present…

● … Over the absolute probability of finding qi

● Can be easily calculated from discrete corpora○ E.g., language corpora, with qi and A being word in the same

sentence.

Interpreting contextual factors

p(Q | A)p(Q|

¬A)

p(qi |

A)p(qi

)

=

Πi

Godden & Baddeley, 1975

Summary: Rational analysis

● Retrieval probabilities = need function =

posterior probability of need memory A given

context Q

● Used economic considerations (costs and

values) and Bayesian inference to infer laws for

priors and likelihood

● Priors reflect history of use, dominated by

frequency (n) and recency (t).

● Likelihood reflects contextual cues q; each cue q

adds in proportion to its previous co-occurrence

with A

Is human memory truly rational?

● In 1989 and 1990, Anderson published several

papers outlining his conclusions

● In 1991, he set out to find an empirical test.

● If human memory is rational, it should adapt to the

human environment○ Again, almost no assumptions about the agent

Ebbinghaus (1885)

● First experimental dataset of

human memory

○ Used non-sensical

syllables

○ Measured how long it

took to relearn

(savings) at different

delays

● Classic result in psychology

● Typically interpreted as

effect of decay of memory

Exponential or power decay?

Regular form

Exponential law:

P = α-βT

Power law:

P = αT -β

Log transform

Exponential law

log(P) = -βT * log(α)

Vs. Power law

log(P) = log(α) – β log(T)

Power law is linear when

time T and probability P

are in logs!

● Remember that we are

estimating odds.

● When we retransform

them in probabilities

(need function), results

show a power function

between p(A) and t○ p(A) = αTß

● The log/log plot is linear

Estimating need probabilities

Exponential or power law?

Log transform

Exponential law

log(P) = -βT * log(α)

Vs. Power law

log(P) = log(α) – β

log(T)

We can estimate parameters:

α = 3.86, β = 0.13

✅

🚫

Is human memory truly rational? (Reprise)

● In 1989 and 1990, Anderson published several

papers outlining his conclusions

● In 1991, he set out to find an empirical test.

● If human memory is rational, it should adapt to the

human environment○ Environment ⇔ Human memory

● Therefore, the same parameters that govern

human memory should be found in the

environment.

Anderson & Schooler, 1991

Tested three modern sources of information

1. 100 days of New York Times headlines in 1990

2. 100 days of Child-directed speech (CHILDES

database)

3. 100 days of emails received by John Anderson in 1990

The same power function function and the same

parameters should hold across domains

● (Remember: Ebbinghaus parameters (α = 3.86, β = 0.13) were in

probabilities, not odds!)

Anderson & Schooler, 1991: Results

How about social media?

Anderson et al, in press.

Discussion question

What do you think of Bayesian approaches?

What are the limits of Bayesian approaches?

- PSYCH448D, Week 5 Models of Memory /1
- Summary of RL
- Different types of memory systems
- Different types of memory systems: RL
- Memory systems in the brain
- Some questions to start the day
- What is memory?
- Characteristic of memory
- Rational Analysis (i.e., Bayesian framework)
- A timeline of Bayesian models of memory
- A rational approach to memory
- A rational theory of memory retrieval
- A rational theory of forgetting
- Interim summary
- Anderson and Milson (1989)
- Bayes theorem
- Bayes theorem: Nomenclature
- Bayes theorem: More nomenclature
- Bayes theorem: Meaning
- Memory as information system
- From probabilities to odds
- Prior probabilities
- Calculating priors
- Calculating priors: The book rental problem
- Contextual factors
- Breaking down contextual factors
- Simplifying contextual factors
- Interpreting contextual factors
- Godden & Baddeley, 1975
- Summary: Rational analysis
- Is human memory truly rational?
- Ebbinghaus (1885)
- Exponential or power decay?
- Estimating need probabilities
- Exponential or power law?
- Is human memory truly rational? (Reprise)
- Anderson & Schooler, 1991
- Anderson & Schooler, 1991: Results
- How about social media?
- Discussion question