5/07/2012

Free Will and Determinism

I recently read the introduction to History of Civilization in England by Henry Thomas Buckle (1821-1861). In this long essay, he outlined his dissatisfaction of history study being a collection of facts and called for its transformation into a proper science, which generates knowledge in the form of causal connection. He made many insightful remarks and one that got me thinking is about free will vs. determinism. He Obviously took the determinist position when he claimed that if he has enough information about the circumstance which his well acquainted friend is in, he can predict with certainty what his friend will do next. He keenly pointed out a problem of free will. Before we can have a meaningful discussion, I want to make explicit the essential conception of a free will. We think of a free will as an active agent capable of dictating one's thoughts and actions independent of one's experience or the given circumstances. Buckle's insight is that in order to freely will, we need to have another authoritative agent independent of deterministic laws to inject such decision into our consciousness given that the physical universe evolves according to deterministic laws. But how does that agent operate? Given that it lives in a world also evolves according to some deterministic laws, since it has to be free as well, it would require yet another free active agent. (This kind of reasoning is not unlike Descartes's notion of mind-body duality, which is wittily summarized as "ghost in the machine" by Gilbert Ryle.) Thus we enter an infinite regression that unsuccessfully evades the contradiction between determinism and free will.

This is a tough question because one is reluctant to give in entirely to either side. Some people suggested that quantum physics might hold the answer due to the truly random events allowed in quantum physics. I initially felt unsatisfied by such proposal because we want to think of will actively, instead of  randomly decides on things. But lately I came to realize that in observation of a single act, the two are indistinguishable! My worry was due to my subjective experience of activeness. Let's put aside the subjective content and imagine an experiment in which Buckle watches his well acquainted friend for a while and then predicts what his friend will do next. If free will exists, Buckle's friend will be able to decide on things independent of his experience and as a consequence, Buckle will not be able to predict with certainty even with all the information. In observation, that is precisely what you expect from a random event whose outcome is independent of the past. Although our subjective experience is too intricate to give us direct clues, the unexpectedness of a certain decision might have germinated from a seed of randomness allowed by quantum physics.

This resolution proposes that we give up the deterministic view, which arguably is suggested and supported by classical physics, to allow for some non-deterministic events, which is supported by quantum physics, our current best knowledge. This answer might still upset people believing in free will, because randomness seems to dismiss one's active control over his or her life. But the pursuit for the possibility of control only pulls us back to the infinite regression. While the moral implication in absence of an active agent is interesting to consider, I think I will spend more time on pondering about how quantum mechanics allows for randomness when all its laws, like those in classical physics, are deterministic (in probability). In quantum mechanics, randomness creeps in at the act of measurement whose outcomes' probabilities are predicted by input and laws, but not the outcome itself. I will subject further discussion on this topic to a future article. Let's revisit our thought experiment and introduce a modification: instead of predicting the exact action of his friend, Buckle will predict the probability of each possible action and we shall repeat the experiments many times (maybe with identical clones of his friend). If free will can actively control one's decision, Buckle's friend can decide how frequently he wishes to do a certain thing and thus he can manipulate the frequencies of each action rendering Buckle's prediction of probability incorrect. This is why the condition of "a single act" is needed in my statement of indistinguishability. However, this experiment is hard, if not impossible, to carry out because after each decision, the state of Buckle's friend changes and cloning the quantum state of Buckle's friend is impossible: we need to find some decision whose probability does not change when repeated. Hence a definite answer to whether free will exists, and if so whether it exists in the weak, random form or the strong, ghost form, is not known yet.

5/02/2012

Text Encoding Conversion

I recently learnt about text encoding and was motivated to write a simple program to convert the MP3 tags in batches (most of my Chinese songs' tags were not encoded in UTF-8, the standard across many platforms nowadays). I will try to give a list of the essentials about text encoding and conversion and then talk a bit about the program I wrote.

What are Encoding, Decoding and Conversion?
1. the characters, i.e. symbols, need to be stored in a (binary) physical representation on the computer. The mapping from the symbols to the physical representation is called encoding and the inverse mapping is called decoding. For example, when you read a text file from your hard drive to display on the screen, the program decodes the file content to know what to draw on the screen, and when you save the text file, the program encodes the content into the file. As you might imagine, there are many encoding schemes or codec out there. This creates a problem when a program reads a file with a codec different from the one used to save it.

2. Conversion is a mapping from one physical representation to another such that the decoded text of the output are the same as the decoded text of the input. Conversion is tedious since we need to construct the mapping between the desired encoding pair that might follow very different structures and thus it is hard to automate. (If you are lucky, people have written it before you.) Then Unicode comes to rescue us. Unicode solidified the identity of symbols, which are the abstract beings that we human really care, into code points for each linguistic symbol in every major languages in the world. So now, we can simplify the conversion between any encoding pair by first decoding the binary to Unicode -- this mapping is written for most encoding normally used -- and then encoding them into the desired encoding. Python provides very good Unicode support and Python 3's string is represented in Unicode (handy in handling file names).

3. It is hard for a program to determine what encoding the file is saved in given only the file content in binary. There are various protocols to communicate this information, e.g. the charset declaration in HTTP response and Windows's BOM prefix to txt files. Another solution is to agree on using an encoding large enough to accommodate most, if not all, linguistic symbols, so every program from now on can assume this encoding: like everything is written down in the same language. The standard now is UTF-8 and so you usually want to convert text encoded in other encoding to UTF-8 for compatibility with latest software.

What does my program do?
As you expect, it converts MP3 tags encoded in <encoding> into UTF-8. You need to supply your guess for <encoding> -- I will talk briefly about auto-detecting encoding later. In default, I set the guess to GBK, the most common encoding present in Chinese songs' MP3 tags. This is the culprit that caused all the mess in displaying song information on mobile devices. For a list of acceptable codecs, look here: http://docs.python.org/library/codecs.html#standard-encodings. One cool feature of my program is that it converts the tags character by character and preserves well-formed UTF-8 characters. This design has two advantages. First, I observed that some tags have mixed characters in UTF-8 and other encodings: the other encoding does not have the needed character so some characters were encoded into UTF-8. This technique solves the problem easily. Second, more importantly, this technique makes it safe to run this program multiple times over the same files because it won't change the previously converted content in proper UTF-8. 


Using my program
To use this utility (you will need python and mutagen library), download and save the program here: https://gist.github.com/2578542. And in your command prompt or shell, type:
python mp3_tag.py [<dir>]
where <dir> is the directory of MP3 files (handles the current directory if <dir> is missing). This is an example output (a log file will be created too):


A note for Windows users
I used mutagen to access and save MP3 tags. It reads many different formats of MP3 tags but saves all output tags uniformly in ID3v2.4 format which works fine on mobile devices and various modern MP3 players but it is not supported by Win 7 file explorer and the Windows Media Player. (They only support tag versions up to ID3v2.3.) So after conversion, you will see many "?" in the MP3 tags. To solve this problem, you will need to convert the tags to ID3v2.3 with other tools. I used iTunes to do that (right click on the highlighted song(s) and choose "Convert ID3 Tags..." and then select ID3v2.3):

How to detect the encoding?
This is a nontrivial problem because it takes more than a program to determine whether the decoded text makes sense. The funny symbols you get by using a wrong codec are perfectly normal in a different language. It takes a lot intelligence to determine if you use the correct codec. Some modern browsers provides some capabilities to detect the encoding on a page. One idea would be to see if the decoded text fall into some range of frequent characters of some languages. But this only solve some wrong guesses since a page might contain a wide mix of symbols than your assumption. In the context of MP3 tags, let's suppose that Big5 and GBK both decode this same binary but to very different texts and so the assumption on songs using only frequent characters is not a good one because some songs' names do use  rare characters. One solution I thought of is to use search engines to select the codec that gives most result on the web: one problem is that many songs are decoded wrongly in the same way of the one sitting in your computer (so wrongly decoded texts often yield much more result than you expect). The other solution I thought of (and actually used to fix conversion errors) is to use SoundHound, a very cool app that can reverse search songs, to listen to a song and return me the song information. (This method could also help you recover the names of forgotten songs in your computer.) Right now, it is a bit silly and time-consuming to do this to a lot of songs as it must be done by hand (unless you reverse engineer SoundHound). I am hoping that SoundHound will release an API soon so this could be automated. (Besides one can use its API do make a Sing Something!)

Coda
Regarding the source code, I do not provide any warranty but you are free to do whatever with my code. It edits some tags of your MP3 files, specifically only the album, artist, album artist, performer, and genre fields (you can modify the code to edit more or less fields) so you might wanna test it on a few duplicates before running it on your entire music library.

5/01/2012

Campus Safe Ride Lookout App

This is my submission to 2012 UChicago Mobile App Challenge. It is a rather simple idea inspired by the need to wait for University of Chicago safe ride shuttle outside at night (possibly in bad weather) due to the lack of arrival information of the shuttle ahead of time. This app aims to solve exactly the information communication problem.

The Idea
This app will be in constant look out for the safe ride shuttle for you.

The Problem
Safe ride is a great service provided by the University and many friends I know use it regularly to get to places not served by the University shuttle at nighttime or in bad weather. The most important concern -- they put it in the name -- safe ride tries to meet is safety. Arguably, convenience of the rider is another concern. But these commitments have been challenged by the unpredictable wait time. This creates problems: standing in the cold or at very late hours is not very pleasant, much less is it safe. At the same time, safe ride drivers would like the riders to be ready to get on the shuttle when they arrive. It is a very reasonable request that saves everyone's time, but in practice, because the approach of the shuttle is not always well communicated in advance and the wait time is unpredictable, many riders and drivers miss each other. The result is that both the riders and drivers are unhappy.

My app will solve these problem to make using safe ride even safer and more convenient. By providing users with more information about the location of the shuttle and notifying the user when it comes within the vicinity of the user, the user can minimize their time exposing in the cold or to danger and the shuttle can wait less and run more efficiently.

The App
This location-based service (LBS) app notifies its user to walk outside when a safe ride shuttle is approaching the user's location. There will be two components to this app. One component lives on the shuttle driver's smartphone to acquire location data (via GPS or Wifi) and send the update of the shuttle location back via data network or SMS message to a server backend. The other component lives on the user's phone to acquire the user's location (via GPS or Wifi) and show the safe ride shuttle's latest location on a map (only) when it comes within the proximity of the user. So the user can get prepared and head outside before the shuttle arrives.

The limit on the viewing of the shuttle location to the user's vicinity is designed to enhance security by only informing the needed audience. Also, the app will require a UChicago email address to sign in to make the information of location of the shuttle even more secure. There are a lot more this system can do beyond these basic capabilities. For example, the user can make reservation from his or her app with a click since the app knows his or her location (and with this feature, we can only disclose the shuttle location to users that have made a reservation).


This is my very first youtube video in which I proposed this app:

If you like the idea, like the video, share it with your friends and vote for Safe Ride Lookout at the 2012 UChicago Mobile App Challenge! I will update the progress here before it becomes significant enough to move to its dedicated space.

Update

5/10/2012
Vote for this app at: http://techincubator.uchicago.edu/page/vote-your-favorite-app so safe ride may survive and improve. My hope is that this app along with an automated reservation-dispatch system, which I suggested above, might save the cost needed to keep this service alive.