Denison CS181/DA210 Homework

Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE".


Encoding Homework Exercises

Q1 One of the reasons for the existence of Unicode is its ability to use strings that go beyond the limitations of the keyboard. Relative to the discussion in the chapter, Unicode is about the strings we can use in our programs, and the issue of how they translate/map to a sequence of bytes (i.e. their encoding) is a separate concept.

When we have the code-point (generally a hex digit sequence identifying an index into the set of characters) for a Unicode character that is beyond our normal keyboard characters, we can include them in our strings by using the \u escape prefix followed by the hex digits for the code-point. Consider the Python string s:

s = "Unicode examples: \u2B2C and \u266A and \u1F60 and " \
    "\u265E and \u0394 and \u0402"

Write code to print s, then assign to b8 the UTF-8 encoding of s, and b16 the UTF-16BE encoding of s. For each, use the hex() method of the bytes data type to see a hex version of the encoded values. Use the following code cell for your Python sequence. Then, in the subsequent Markdown cell, answer the following questions:

  • which of the hex representations is longer?
  • give explicit lengths for b8, b16, and for the two hex() transformations.
  • how does this compare to the length of s?
In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

Q2 Write a function

shiftLetter(letter, n)

whose parameter, letter should be a single character. If the character is between "A" and "Z", the function returns an upper case character $n$ positions further along, and "wrapping" if the + $n$ mapping goes past "Z". Likewise, it should map the lower case characters between "a" and "z". If the parameter letter is anything else, or not of length 1, the function should return letter.

Hint: review functions ord() and chr() from the section, as well as the modulus operator %.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
assert True

Q3 Building on the previous exercise, write a function

encrypt(plaintext, n)

that performs a shiftLetter for each of the letters in plaintext and accumulates and returns the resultant string.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
assert True

Q4 Write a function

singleByteChars(s)

that takes its argument, s, and determines whether or not all the characters in s can be encoded by a single byte. The function. should return the Boolean True if so, and False otherwise.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
print(singleByteChars("Hello \u2B2C"))
singleByteChars("Hello")
In [ ]:
assert True

Q5 Suppose you have, in your Python program, a variable that refers to a bytes data type, like mystery refers to the bytes constant literal as given here:

mystery = b'\xc9\xa2\x95}\xa3@\x89\xa3@\x87\x99\x85'\
          b'\x81\xa3@\xa3\x96@\x82\x85@\xa2\x96\x93'\
          b'\xa5\x89\x95\x87@\x97\x99\x96\x82\x93\x85'\
          b'\x94\xa2o@@\xe8\x96\xa4@\x82\x85\xa3Z'

Perhaps this value came from a network message, or from a file. But you suspect that it, in fact, holds the bytes for a character string, and you need to figure out how it is encoding. Assume that you have narrowed the encodings down to one of the following:

  • 'UTF-8'
  • 'UTF-16BE'
  • 'cp037'
  • 'latin_1'

Write code to convert the byte sequence to a character string, and determine the correct encoding. By the end of your code, assign to s the "correct" decoding translation.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
assert True