up

Problem Set 2 (4.5 - 18.5)

In this exercise we want to explore self-organizing maps (SOM, see Wikipedia). The algorihm works in the following way:

  1. Start by defining a network of nodes. Each node is given random coordinates in the space in which the data lies. The distance between two nodes A and B in the network is defined by the minimal number of edeges one has to pass to get from A to B.
  2. Pick a random data node and find the nearest node (the winner), using the euclidean distance in the data space. Move the winner node towards the data point. Also move the neighbouring nodes in the network (e.g. those that have a network distance 1 to the winner node) towards the data points, but by a smaller amount.
  3. Repeat the previous step in a loop. Use a learning rate that defines how far the nodes are moved in each step (i.e. by acting as a factor). The learning rate should be decreased over time, but slow enough for the network to adapt to the data distribution.

Your implementation should have the following features:
  1. Implement the algorithm for both a 1D line network (where each node has two neighbours except at the borders) and 2D mesh networks (four neightbours, so that it forms a grid). Note that no wrap around can be used. Both version should be derived from one SOM base class.
  2. Write a unittest to check that a single update step works without raising an exception (i.e. just run a single update step, if an error occurs the test automatically fails).
  3. Pick a data distribution of your choice (e.g. a Gaussian data cloud) and train the two SOM types. Visualize both the data and the SOM in matplotlib (you can use the normal plot command, e.g. with "o" for the points and plotting one line at a time). An animation could help with the debugging and picking a good learning rate.

Note that since you work in groups you can distribute the work (e.g. one person writes the unit test while the other workds on the plotting).