MCB419 logo  




Homework

MCB 419 Homework 8 (Spring 2008)


Questions appearing in red are to be answered in the hw08.txt file.

When you've finished the assignment, email your responses and a copy of your project file to mcb419@gmail.com with 'hw08' in the subject line.

The homework is due by 11:59PM on Tue, Apr 1.


Reinforcement learning

BeeForage

BeeForage
Learning to cope with uncertainty in nectar reward values.

In this project, you'll develop a 'brain' for a honeybee-inspired bot. The bee's goal is to collect as much nectar as possible from the orange and blue flowers in the time alloted.

The bee has color-selectable 'virtual' sensors (snsL and snsR) that can be used to navigate toward or away from particular targets. The sensor color preference is determined by the bee's targetColor variable. If you're not familiar with virtual sensors, you might want to review the following examples: example 1, example 2, example 3.

  • The "good" flower color provides a nectar reward of +1 and the "bad" color provides -1. The initial selection of which flower color is "good" is randomized. The reward pattern switches approximately every 15 seconds (the "good" color becomes "bad" and vice-versa).
  • In order for the bee to receive a nectar reward, it must remain in contact with the 'yellow' center of the target for 10 clock ticks (1.25 sec).
  • When the flower delivers its nectar reward (+1 or -1, depending on color), the bee's energy is updated accordingly, and the center of the flower turns gray to indicate that the flower has been 'depleted'.
  • Depleted flowers are refilled when the reward pattern changes (every 15 sec).
  • The simulation will stop automatically when the elapsed time reaches 100 seconds.


Assignment

1. Run the simulation with the bee's default controller and the target color set to 'yellow' (default). Briefly describe the bee's behavior. How much nectar did it collect in 100 s (i.e, what was the final 'score')? Why was the score so low?

2. Reset the simulation and set the bee's target color to 'orange' using the appropriate button in the panel the right of the playfield. Run the simulation for 100s. How much nectar did the bee collect in 100 s? In what way did the pattern of nectar rewards differ from the previous (yellow) run?

Now develop your own controller code that allows the bee to maximize its nectar reward over 100 seconds. (Note: an average score of 10 should be easy to obtain; 15 or more is good; above 20 is great). You are free to make the controller as complex as necessary to solve the task, subject to the following guidelines:

  • You CANNOT access any playfield or flower variables directly from your controller code. You must use the bee's sensor values and its own internal variables to guide behavior.
  • You ARE allowed to access the bee's x, y and heading variables, if desired.
  • You ARE allow to use the bounce function, if desired.
  • You ARE allowed to call updateSensors for multiple target colors on a single time step, if desired.

3. How will your controller decide if the bee has received a positive or negative reward? How will it decide which color is currently "good" and which is "bad"? What movement strategy and action-selection policy will you use to maximize visits to "good", non-depleted flowers?

4. For your final controller design, record the total nectar reward obtained on each of 3 CONSECUTIVE simulation runs (100 s):

 


Copyright © Mark E. Nelson, University of Illinois at Urbana-Champaign, 2005-2008. All rights reserved.