Saturday, 26 September 2015

Does Google dream of electric dogs?

You may have heard about Google's Deep Dreaming network, turning completely normal pictures of clouds into surreal cityscapes.
I was curious about what would happen if I asked the network to dream about the cosmic microwave background (CMB), the baby photo of the Universe. The CMB is the background radiation left over by the Big Bang and is one of the most important probes of the Universe that we have. As a pattern, it looks pretty close to noise so I had no way of knowing what might come out of Google's dreams.

What is Deep Dreaming? Well this all came from the desire for search-engine giant Google to solve the image-recognition problem. This is a good example of something that humans do very easily at just a few weeks of age, but is notoriously difficult to get a computer to do. However, Google does seem to be cracking the problem with their deep learning algorithms. They're using convolutional neural networks, which are neural networks with many layers. So it seems Google has had the most success with a machine learning algorithm that tries to imitate the human brain. 

Deep Dreaming is what I suspect the people at Google came up with while they were trying to understand exactly what was going on inside their algorithms (machine learning can be a dark art). The original blog post is excellent so I won't repeat it here but will rather show you what I got when playing around with their code.

You can actually submit any image you like here and get a 'inceptionised' image out with default settings, but I wasn't happy with the result (the default settings seem optimised to pick up eyes) so I decided to dig deeper. For the technically inclined, it's not that hard to run your own network and change the settings to get the strange images in Google's gallery. Everything you need is on the publicly available github repository, including a great ipython notebook for doing this yourself.

The CMB as seen by Planck
I took a (relatively low-resolution) image of the CMB from the Planck satellite to test the deep dreaming methods on. Basically all of the DeepDream images are produced by picking something and then telling the neural net to try and make the input image as close to that thing as possible. It's the something you pick that will change the images you get out. 

Google's trained convolutional neural network is over a hundred layers deep, each layer corresponding to some particular things the network has seen in the training set. When running the network on an image, you can pick out one of the layers and enhance it.

Picking out higher level layers tends to bring out smaller, more angular features whereas lower levels tend to be more impressionistic, smoother features.

Low level layer - more impressionistic features
High level layer - more angular features

Then you can also ask the network to try and find something particular within your image. To the technically inclined, this is equivalent to placing a strong prior on the network so that it optimises for a particular image. In this case, I asked it to find flowers in the CMB.

"Guide" image
Generated image

Now comes the fun stuff. You ask the network to generate an image as before, focusing on a specific layer. Then you take the output image, perform some small transform (like zooming in a little) and run it again. This is Google's dreaming. You can start with almost any image you like. Playing around I eventually found "the dog layer" where it continuously found dogs in the sky. 

The CMB is very quickly goes to the dogs
After a while it starts finding microwaves too

I made a video of a few hundred iterations of this (note: I lengthened the first few frames otherwise they go by way too fast). I find it really appropriate that after a while, it starts finding what really looks like a bunch of microwaves, in the cosmic microwave background radiation... Enjoy the show!

Sunday, 13 September 2015

Cosmic cats on the radio

Thanks to my Nuffield student Rose Yemelyanova for creating the great cat images as part of her project and letting me use them here. 

The title may not sound scientific but it turns out cosmic cats are a great way
to explain radio interferometry, a notoriously tricky topic, so I have good reason for the quirky wording.

It may not surprise you to know that radio telescopes see the world very differently to us. For many years astronomers have explored the Universe across the entire range of the electromagnetic spectrum, including the radio. A radio telescope is not, in fact, particularly different from a tv satellite dish, it's just pointed away from those signal-beaming satellites and out into space instead.

The world would look quite different to us if we could see in the radio. For one thing we'd be able to see through walls, but it'd also be terribly noisy with all those cell phone and tv signals bouncing around. But to our best type of radio telescopes, radio interferometers, the world looks very different indeed.

This is because interferometers are like broken telescopes. See, here's the problem in astronomy: bigger is always better. The bigger your telescope, the better your resolution. The largest single-dish radio telescope in the world is Arecibo in Puerto Rico. With a 300m diameter, it is so large it had to be built in a massive natural sinkhole. This is pretty much as big as it's going to get for telescopes, it's just not physically possible to support that much steel, especially if you want to actually be able to move it (something Arecibo can't do...). Fortunately, we can cheat. We can split up a gigantic telescope into lots of smaller ones, and it turns out to work just as well. 

This is called interferometry and it works a bit like your ears. A sound signal reaches each ear at a slightly different time, but instead of hearing a mess, your brain is clever enough to put the signals together and figure out roughly where the sound came from in the first place. Interferometers are like the ears of astronomy (and the brain bit is called a correlator, a complex and fast computer). The great thing about this, is with interferometry we can build telescopes that act like they're 36km wide, such as the VLA (which is an ingenious acronym for Very Large Array). This gives us exquisite resolution in radio astronomical images.

The problem with radio interferometry is that it can be tricky to understand if you want to go into a bit more detail than "they're like ears". The thing is, you now have a big telescope with lots of gaps, instead of having a single continuous dish. So the question is, what does a telescope like this actually see? If you make an image from radio data, do you see gaps like missing pieces of a puzzle or is it something else? 

The intuitive answer, that gaps between telescopes equals gaps in an image is not right, but then very little in interferometry is intuitive. What's important to a radio interferometer is scale. Two antennas far apart pick up the smallest scales (so very fine detail) whereas antennas close together see only the large scale, but pick up none of the details. It's like the difference between investigating the pattern on a single brick on the wall and looking at the entire cathedral (details on a single brick is small scale while the whole cathedral is large scale). The crucial quantity is the baseline: the distance between any two antennas is a baseline, and the limiting resolution of the instrument depends on its longest baseline. It's like if you had dozens of ears, the largest distance between any two would dictate how well you'd be able to pinpoint the exact location of any sound. Sort of.

The best explanation I've ever seen of how this works are in this (Maths-free!) description of the Fourier transform and this explanation of how interferometers work from the ALMA website. Since those websites do such a great job explaining how radio telescopes see the world, I'm going to talk about cats instead.

Have a look at this picture of an adorable kitten. What we did (because, well, we can) was take this picture of a cat, put it in the sky, and make a simulation of a telescope observing this cat and had a look at what it would see. So this picture can be seen as the "true sky", what you would see if you had a perfect telescope. But we don't have purrfect... sorry, perfect telescopes so we simulated some real, imperfect ones.

We looked at three telescopes: 
  •  KAT-7: a 7-antenna array in South Africa, which a small precursor to the massive Square Kilometre Array (maximum baseline 185m).
  • WSRT: a well-established 14-antenna array in Westerbork, Netherlands (maximum baseline 2.7km).
  • VLA: a premier radio interferometer in New Mexico, with 27 antennas (maximum baseline 36km).
So, what happens if you put that kitten on the sky and observed it with KAT-7? Well you get an image that looks like this. If you're really creative (and you already know it's a cat), you can sort of make out a head and some ears. Other than that, it just looks like a lot of mess. Not at all cat-like. This is because you've got too many gaps and not long enough baselines. You can get a sense for the large scale information, but you lose all the details. So the problem here is both not enough antennas at different baselines, and also that the longest baseline is quite short so we're only getting large scales.

Now what happens if we look at the same kitten with the WSRT? This telescope has more baselines, because it has more antennas, and they're also farther apart than KAT-7. And indeed, it's starting to look like a cat. We can make out the basic shape, but still don't have any of the details. So we have more baselines (antennas at different distances), but we still don't have long enough baselines to pick out the details on the cat's face.

Ok, lets look at this cat with world's current best interferometer, the VLA. Suddenly, it actually looks like a cat! The extra and longer baselines of the VLA are able to fill in the details that the WSRT was missing, and so we finally are able to see our adorable cosmic kitten.

However, you may still notice some junk around the kitten. It's still not a perfect reconstruction and this is simply because by definition, you will always have gaps in an interferometer. We can't put them infinitely close together to fill in all the gaps, and we wouldn't want to because it would be expensive. If you only have 27 antennas, then you want to put them at a variety of distances apart to get a range of baselines, and hence be able to see both the detail of the cat's face and its general outline as well, although you don't see either one perfectly.

Some telescopes are designed to look at the large scale and do big surveys so they tend to cluster all the antennas close together (in fact, with its new technology LOFAR can see the entire sky at once). Some telescopes however are designed to go really deep and at high resolution on a small patch of sky and thus the antennas are place farther apart. The SKA is so great (and so expensive) because it's going to try and do both, with lots and lots (2000+) of antennas...

The conclusion: radio interferometry has heralded in a new age of incredibly high resolution radio astronomical images, generations of confused grad students (ploughing through Fourier transforms and dirty images) and now, a new generation of students putting cats in the sky for the sake of a blog post. 

Saturday, 5 September 2015

The good, the bad and the ugly

Dealing with contamination in supernova datasets

Picture this: you have a room full of people and the one burning question you want answered is "what is the average height of women in the room". But the problem is, you don't know which people are women. The only information you do have is the length of each person's hair and their height. What do you do? Well you can make an educated guess as to whether someone is a man or a woman based on the length of their hair. You could even assign a probability, people with longer hair are more likely to be women. Then you could cut your dataset based on the probability and assume everyone with a probability of, say, 90% or higher are women.

There's a problem with this approach though. You're going to lose a lot of your data, since many women might have a probability less than 90% because of short hair. Much worse, you'll also get some contamination. Men with long hair will contaminate your dataset and cause your estimate of the height to be biased, since they are on average, taller. You can get around this by using a higher probability cut, resulting in less contamination and less bias, but then you'll get a much worse estimate of the average height, since you have less data in your sample.

This example might seem ridiculous, but it has very real applications in cosmology. Not with hair, obviously...

In the first half of my PhD, I worked on supernova cosmology and how to deal with contaminated datasets. A supernova is an exploding star, among the brightest events in the Universe, meaning we can see them far away. Supernovae are incredibly useful for cosmology because we think we know how bright they actually are, and hence by measuring how bright they appear, we can figure out how far away they are (much like a car driving towards you on a highway at night).

But there's a catch. Only one type of supernova is useful, because it's the only one that behaves well enough for us to have a good idea of its intrinsic brightness, and that's the type Ia supernova (pronounced "type-one-A"). Type Ia's can give us tight constraints on the fundamental parameters of the Universe, and are the main reason we know dark energy dominates the Universe.

The problem is, in order to know for sure what type a particular supernova is, you have to take a spectrum. It's a bit like doing a DNA test: it's a very accurate way of determining the type, but it's also very expensive. Taking an image is easy, taking a spectrum needs a lot of light, a long observation and hence a lot of telescope time. There's also a time pressure, since a supernova will decay in 20-90 days after initial observation, so they have to be spectroscopically followed-up quickly. In the past, it was possible to follow-up almost every object that was detected which was thought to be a supernova. New surveys such as the Dark Energy Survey (DES) and the upcoming Large Synoptic Survey Telescope (LSST) will make this impossible. LSST is expected to detect anywhere between 10 000 and 500 000 supernovae in its lifetime and it will simply be impossible to get enough telescope time to confirm them all spectroscopically.

So where does that leave us? Well back to the hair problem. Our type Ia supernovae are the women in the first example, and their average height corresponds to the cosmological parameters we can get from them. Non-Ia's are contaminants, causing biases in the sample. And just like in the above example, we have incomplete information. You can use the light curve of a supernova (its brightness as it changes with time), which is much easier to obtain than a spectrum, to get a probability that the supernova is a Ia, just like in the hair example. But you can never know for sure without a spectrum. So what we're left with is a contaminated supernova dataset, with only a probability of being a Ia for each object.

And this is where BEAMS comes in. BEAMS stands for Bayesian Estimation Applied to Multiple Species and is a great way to solve this problem. Instead of applying cuts to the dataset and allowing the contamination to bias your results, we use all the data but in a statistically consistent way that allows us to get good constraints without biases from the non-Ia's.

The difference with BEAMS is that at no point do we ever decide an object is a Ia or not, we allow it to be both. Going back to our example (note the following plots are basically entirely made up), if we plotted the distribution of heights (i.e. we bin the heights of all the people in the room) and we look at the average of all the heights, this clearly does not represent the average height of women in the room, even if we were to cut out a lot of the sample of probable men.

Instead, with a mixture model, we model two populations instead of one and each population is allowed to have its own average (the parameter we're interested in). You can see that in the extremes ends of the distribution, it's quite clear which population a particular height is most likely to belong to, but somewhere in the middle it's hard to tell just from the data. These points contribute a fair bit to both average values (of men and women).

Remember I jumped up and down about probabilities earlier? They're really important in getting mixture models to be precise. If I have a person who could potentially belong to either population but they have really long hair, they have a much higher probability of being a woman and so contribute more to the women's average than to the men's. This allows us to get unbiased estimates of both parameters, whichever one we're interested in. Of course, the more reliable the probabilities, the better this estimate will be.

So the idea behind BEAMS is to give up on trying to perfectly classify every object with spectroscopy and instead account for the contamination we know we're going to get by fitting for it with a mixture model.

The piece of the puzzle I actually worked on in my PhD was, what do you do if the data are correlated? Correlated just means each point is not independent of the others. For example if all the people in the room were made up of families, each person would be correlated with other members of the families since their heights would probably be similar.

It turns out it's computationally impossible to solve the original form of the BEAMS equation if the data are correlated... This was a serious problem so I worked to show you can still do BEAMS with correlated data, you just have to change doing part of the calculation from analytic to numerical. It ultimately instead of going through every possible combination (which was very large) we did some random sampling (with some guidance) to quickly get to the right solution. We repeated our analysis with over 10 000 simulated datasets to ensure we could accurately recover the cosmological parameters we were interested in, every time.

If this last bit is not a complete enough explanation for you, dig into the details in our paper. You can also read the original BEAMS papers in these links:

And more recently, a paper proposing to use BEAMS as part of a much larger, meta supernova analysis:

So supernova cosmology in the era of LSST is going to be new and exciting territory, but if we can measure the average height of women in a room, it seems we can do cosmology with exploding stars.