Artificial Neural Networks

(ELOG 7 for CS3790: Cognitive Science)

Artificial neural networks are kinda nifty. To learn more about them, I thought I’d try playing with one. Here’s a couple of the places I tried: Blackjack with Reinforcement Learning and Java Mouse Neuron Test.

The first site uses an ANN to learn how to play Blackjack. You can start by playing alongside the computer in real time. The computer plays randomly at this point. If you want the computer to start playing better, the computer needs some alone time to get to learn the rules a little better. I set it to a 1000 learning episodes of 100 games each. As you run through the training you can watch the computers win average improving. It steadily goes from about 30%, which is about what you get playing randomly – to 40%. It then sits around 40% slowly improving. It’s impossible for the computer to really get above 50% since the game’s odds are weighted in favor of the dealer (the applet doesn’t account for pushes or splits).

It’s interesting how with just a few training episodes the computer can start really playing a better game. It’s hard to observe the overall difference playing as hands as slowly as a human needs to, but you can see the computer making more informed decisions after the training. ANNs are kind of nifty like that.

I decided to try to improve the training. I experimented with different values for winning and losing weights. Trying to find the right balance of reward/punishment to encourage learning in the quickest manner possible. I found weighting a win heavier than a loss was a quick way to favor learning, since losing is expected over 50% of the time any way. Weighting the wins too heavily did cause a lot more oscillation however, since the network started to think it understood what was going on just because of a chance win more often.

The other ANN I played with didn’t have any sort of training whatsoever. The associated paper discusses designing a neural net that learns more like a person and less like a computer with the hand of god involvement you’ll typically see. In the real world, there’s not someone to tell you every time whether you responded correctly or not, so the network was designed to learn in whichever way it desired, without a person telling it whether its actions were correct or not. I was able with a few tries to get the mouse to start to circle the goal without any sort of training whatsoever. I’d like to play around with this idea further – no training ANNs…

Leave a Comment