Problem Set 4 (14.5 - 21.5.07)
This problem set consists of two parts. You have to solve one of the three exercises. The first one is rather simple, so it is most instructive if you try the second one first. If you find that the second exercise is too hard you can then go back to the first one.
This exercise refers to exercise 3.1 from the analytical tutorial. We want to verify the theoretical findings by computational experiments.
- Generate a one dimensional data distribution with two zones of a different constant data density (e.g. one zone from -10 to 0 and one from 0 to 10). Then calculate the optimal distribution of reference vectors and compare their density to that of the data. The data set and the number of reference vectors should be rather large (to approximate the smooth theoretical case).
- Repeat the first part of this exercise with a higher dimensional data set. Try to plot the distribution of the reference vectors over the data zones. How does it depend on the dimensionality?
The second two exercises deal with SFA, which was briefly introduced in the lecture. To get the technical details take a look at this
paper or come to the tutorial. You are also invited to use the MDP library
for Python (Modular Toolkit for Data Processing), which provides an SFA implementation. This library is a very nice example of a library you might use larger research projects. Getting used to working with a third party library might therefore be quite useful. Alternatively you can of course write your own SFA implementation.
Exercise 3 is more difficult than exercise 2, but the stimulus is much more interesting.
- Create sampling data for three floating point functions on the integers from 1 to 100. The first two functions should vary slowly (e.g. sine waves with different wavelengths and phases). The third one should be fast (e.g. just pick random values or use a very short wavelength).
Now create three different linear mixtures of the sampling data (i.e. for each mixture you define a constant weight for each function and then sum over the weighted values at each point in time).
Implement the SFA or use MDP. Process the three mixtures with it (so the input signal is three dimensional). Can you extract the two slow signals? Plot both the input and the output signals in a comprehensive way (try to make it look good, don't forget to label your plot).
We now want to take a slightly more interesting signal. Therefore we take a one-dimensional "retina" with gray values (e.g. 15 pixels with values from 0 to 255). Create a background and a local structure that moves slowly in from of the background (e.g. a 5 pixel stripe pattern which moves periodically from left to right).
Visualize the generated data in a 2d image by stacking the 1d signals (i.e. using the y-axis for time). Display the image directly or store it.
Process the data with the SFA algorithm and plot the first three output signals (i.e. the slowest ones). If the result is not satisfactory you can try a nonlinear expansion of the input signal (e.g. increase the dimension of the input signal by including quadratic terms).