- HW 01 (due Tu, Jan 22)
intro; basic movement
- HW 02 (due Tu, Jan 29)
kinesis, pillbugs, Ecoli
- HW 03 (due Tu, Feb 05)
paramecium, taxis
- HW 04 (due Tu, Feb 12)
food/photo-taxis, collisions
- HW 05 (due Tu, Feb 19)
informational cues
- HW 06 (due Tu, Feb 26)
edge following
- HW 07 (due Tu, Mar 04)
indiv project (DRAFT)
- Proj #1 (due Tu, Mar 11)
indiv project (FINAL)
- HW 08 (due Tu, Apr 01)
bee foraging
- HW 09 (due Tu, Apr 08)
ant trails
- HW 10 (due Mo, Apr 14)
project peer eval
- Proj #2 (due Mo, Apr 28)
showcase projects
|
Homework
MCB 419 Homework 8 (Spring 2008)
Questions appearing in red are to be answered in the
hw08.txt file.
When you've finished the assignment, email your responses
and a copy of your project file to
mcb419@gmail.com with 'hw08' in the subject line.
The homework is due by 11:59PM on Tue, Apr 1.
Reinforcement learning
|
BeeForage
Learning to cope with uncertainty in nectar reward values.
|
In this project, you'll develop a 'brain' for a honeybee-inspired
bot. The bee's goal is to collect as much nectar as possible from the
orange and blue flowers in the time alloted.
The bee has color-selectable 'virtual' sensors
(snsL and snsR) that can be
used to navigate toward or away from particular targets.
The sensor color preference is determined by the bee's targetColor variable.
If you're not familiar with virtual sensors, you might want to review
the following examples:
example 1,
example 2,
example 3.
-
The "good" flower color provides a nectar reward of +1 and the "bad" color
provides -1. The initial selection of
which flower color is "good" is randomized. The reward pattern switches
approximately every 15 seconds (the "good" color becomes "bad" and vice-versa).
-
In order for the bee to receive a nectar reward, it must remain in
contact with the 'yellow' center of the target for 10 clock ticks (1.25 sec).
-
When the flower delivers its nectar reward (+1 or -1, depending on color),
the bee's
energy is updated accordingly, and the center of the flower turns
gray to indicate that the flower has been 'depleted'.
-
Depleted flowers are refilled when the reward pattern changes (every 15 sec).
-
The simulation will stop automatically when the elapsed
time reaches 100 seconds.
Assignment
1. Run the simulation with the bee's default controller and the
target color set to 'yellow' (default). Briefly describe the
bee's behavior. How much nectar did it collect in 100 s (i.e, what
was the final 'score')? Why was the score so low?
2. Reset the simulation and set the bee's target color to 'orange'
using the appropriate button in the panel the right of the playfield.
Run the simulation for 100s. How much nectar did the bee collect in 100 s?
In what way did the pattern of nectar rewards differ from the previous (yellow) run?
Now develop your own controller code that allows the bee to
maximize its nectar reward over 100 seconds.
(Note: an average score of 10 should be easy to obtain;
15 or more is good; above 20 is great).
You are free to make the controller as complex as necessary
to solve the task, subject to the following guidelines:
-
You CANNOT access any playfield or flower variables directly from
your controller code. You must use the bee's sensor values and its
own internal variables to guide behavior.
-
You ARE allowed to access the bee's x, y and heading variables, if desired.
-
You ARE allow to use the
bounce function, if desired.
-
You ARE allowed to call updateSensors for multiple target colors on
a single time step, if desired.
3. How will your controller decide if the bee has received a positive
or negative reward? How will it decide which color is currently "good"
and which is "bad"? What movement strategy and action-selection policy
will you use to maximize visits to "good", non-depleted flowers?
4. For your final controller design, record the total nectar reward
obtained on each of 3 CONSECUTIVE simulation runs (100 s):
|