Python ‘u’ in Front of a String: What Does It Do?

The ‘u’ in front of a string means the string is a Unicode string.

A Unicode is a way for a string to represent more characters than a regular ASCII string can.

In Python 2.x shell, you can see how for example a text written in Russian converts to a Unicode string with ‘u’ in front of it and a bunch of Unicode characters in it:

>>> hello_rus = u"Привет, мир"
>>> hello_rus
u'\u041f\u0440\u0438\u0432\u0435\u0442, \u043c\u0438\u0440'

In Python 2.x, a Unicode string is marked with ‘u’.

However, in Python 3 all strings are Unicode strings by default. Thus you will never see ‘u’ in front of a Unicode string in Python 3.

History of Unicode Values in Python 2

In Python 2, the string type str is a group of 8-bit characters.

Using 8-bit characters, it is only possible to represent the English alphabet. But other characters, such as the Russian Cyrillic alphabet, cannot be represented by 8-bit characters.

This is because using 8 bits means there are only 255 different values available. If you combine the English alphabet, Russian Cyrillic alphabet, and accented Western European characters, you will quickly find out there are more than 255 characters combined.

To represent characters outside of the English alphabet, Unicode characters are used.

In short, Unicode maps each character to a specific code called a code point. These code points can be converted to byte sequences with a variety of different encodings. (One common such encoding is UTF-8).

For example, let’s create a Unicode string with a Spanish ñ (in Python 2.x shell):

>>> word = u"El Niño"
>>> word
u'El Ni\xf1o'

When you display the word, the Python interpreter escapes the Spanish ñ letter and displays \xf1 instead. This is because the letter ñ is not in the standard printable range. Thus it must be replaced with something that the Python interpreter can understand.

To see this escaped Unicode character, historically you would need to view it with a system that supports displaying it.

But because the Unicode encoding is backward compatible, you can view the Unicode characters in the older systems too. In Python 2.x, you can simply print the Unicode string using the print statement to display the Unicode character. Just notice that this was not possible back in the day.

>>> print word
El Niño

Notice how in Python 3.x none of this matters. All the strings are Unicode strings by default.

Conclusion

Today you learned what is the ‘u’ in front of a string in older versions of Python.

To recap, the ‘u’ in front of a string means the string is a Unicode string. This allows you to work with a wide range of characters, such as Russian Cyrillic characters.

However, this only applies to Python 2.x. In Python 3.x all the strings are Unicode strings for your convenience.