Introduction

There are 4 short labs and you are expected to complete them all (it will take more than a single two-hour lab session, and the assignments are not ordered by difficulty so you will get the most out of your lab time if you try to solve the labs in advance):

  1. Linux/Windows unicode handling of paths
  2. Cross-platform copy_file code (POSIX/WIN32, C++17, and Qt)
  3. Handling of translations (gettext / linguist)
  4. Numerical and date conversions (only Qt)

The lab skeleton is on Gitlab and the instructions for the individual parts are below.

Warning

These are cross-platform assignments; the gitlab repository is setup for the IDA labs - they use gcc cross-compilers and wine to run Windows programs under a Linux OS. While it is possible to use different computers (Linux, Windows with WSL, maybe MacOS, etc), it is probably easiest to use SSH or ThinLinc to access the lab environment.

1. Unicode file handling

The standard C (and POSIX) method of opening files for reading or writing is the fopen function call. The lab skeleton for this part is in the directory unicode. Enter the directory and use the make command on a Linux computer; this will compile an executable called unicode which is a Linux executable running the code in unicode.c as well as a file unicode.exe which is the corresponding Windows executable.

Run the executables using ./unicode and wine unicode.exe (or by compiling and running the code on a Windows computer). Look at what files and output was generated.

  • Answer what happened before you modified the code and why?
  • Complete the my_fopen function to open filenames encoded in UTF-8 on Windows.

Note

printf statements might output gibberish on Wine or Windows because it does not expect UTF8-encoded content to be printed directly to the terminal. You need to verify that the correct files were created instead of looking at the strings.

2. Cross-platform file copying

There are many ways to copy files. Some better than others. One problem for writing cross-platform code for copying files is that for example POSIX does not have a function to perform this common operation (but it does have an operation to link two files). There exists OS-specific functions in Unix-like operating systems to copy files, see for example reflink.c in the lab skeleton for how to make a fast copy on the btrfs filesystem in Linux. One cross-platform method of copying a file is of course reading from one file and writing to another, but that may be slower and more verbose than it has to be especially if file permissions need to be preserved.

The lab skeleton for this part is in the directory copy_file and you are expected to copy the file 马Häst马.txt to destinations (win32.txt, cpp-win.txt, cpp-linux.txt, qt.txt). You may omit error checking; the makefile will compile code, try to run the files and checks if the file has been created:

  • Windows using the WIN32 API (copy_file/win32.c). Note that it has a special convention for unicode string literals L"马Häst马.txt" (that you are not allowed to use in the first lab; it creates a UTF-16 literal on Windows and a normal const char* cast to a long pointer on Linux.
  • Windows and Linux using standard C++17 code copy_file/cpp17.cpp.
  • Linux using Qt copy_file/qt.cpp.

3. Adding translations to a simple application

In this lab, you will add translations to a (very) simple program using GNU gettext and Qt linguist. Start by and modifying the files gettext-main.c and linguist-main.cpp, marking strings for translation (calling gettext("str") or _("str") if the string was "str"). Also update gettext.c, giving the paths of the setup routines.

For gettext, run xgettext --keyword=_ -c gettext-main.c, which gives you a file messages.po (no file is generated if you skipped steps above). Update the character set in it to UTF-8 using a text editor. You can then open and save the file using Qt Linguist (or a tool actually designed for GNU gettext, or a text editor since it is such a small file); add a translation for Swedish (input whatever text you like as the translation but make sure to include some code points not in the ASCII set). When you are done, run msgfmt --use-fuzzy messages.po (you should not use fuzzy strings by default, but Linguist might have marked some as fuzzy and this saves you some headache), which gives you a file messages.mo; copy this to the location you set in the setup routines (see the documentation for the strings that are appended to the path you give; typically dirname/sv/LC_MESSAGES/DOMAIN.mo). Run the code and make sure you get the translated output.

Note

Gettext po and mo-files are used in very many different contexts (including software such as WordPress or Sphinx) and that there are different graphical user interfaces for performing the translations.

Note

You need to use bind_textdomain_codeset to actually see the UTF-8 characters in your terminal later on.

Note

A very useful command if you have difficulty finding the location to put translation files is strace

strace ./gettext |& grep [.]mo

Typical output for success (new file descriptor = 3) or failure (returns -1)

openat(AT_FDCWD, ".../sv/LC_MESSAGES/lab3.mo", O_RDONLY) = 3
openat(AT_FDCWD, ".../sv/LC_MESSAGES/lab3.mo", O_RDONLY) = -1 ENOENT (No such file or directory)

For the second part, you will use Qt. Note that Qt is designed for GUI applications and the way translations are done might seem a little odd, but it simply boils down to passing translatable objects to a GUI widget which is associated with an application which has a translator associated with it. For this lab, it is possible to keep the puts calls, but you are free to replace them with other Qt objects if you find a solution that you like better. As for gettext, the application will usually contain logic to use settings based on the user’s environment variables (or other methods for locale settings as appropriate for the operating system). But in this lab, you may hard-code the location of the machine-readable translation file (which is called .qm and is similar to gettext .mo files; a file called .ts corresponds to the .po file).

To generate the .ts file, call lupdate Linguist.pro. Open it in linguist and choose release in the GUI to create the .qm file (or call lrelease manually). The results should be similar to that of gettext. Which did you find easier? Note that you did not use nearly all features these libraries provide.

Note

There is a tutorial for creating translations in the Qt documentation.

Note that tr("str") inside the QCoreApplication does not translate the strings directly, but if you use QObject::tr("str") it should work as expected.

4. Numerical and date conversions

One of the problems with formatting text for the user is if your application expects textual input formatted using the standard C locale. For example, if you are writing a JSON parser containing the string 1.2345 and your application has called

setlocale(LC_NUMERIC, "sv_SE.UTF8");

standard routines such as strtod() will then give an error for trying to parse that string.

Write a simple program in C or C++ (easier, but still tricky) or Python (it is recommended to install Python packages that perform this task since the base Python does not support it). It does not really matter which operating system you use for this.

Note

C++20 date functions in #include <chrono> should be added with GCC 9, which is available on the IDA lab systems since HT2021.

The program should read a date and a floating point number from a constant string in the C locale (the date as output by LC_ALL=C date +%c) and output this date and number using a Swedish locale (or another locale that will give different results than the C locale). There exists many different solutions to this problem, but calling setlocale(LC_NUMERIC, "C") is not an acceptable solution since it changes the locale for all threads in the program (you might end up printing a number in the wrong format if the operation is performed at the same time that the locale changed in order to read the number). Neither is performing the reading before setlocale() is called. For C, you are allowed to use GNU or Windows-specific extensions that deal with these problems. The following code snippet shows the expected order of function calls (with some fictive names):

setlocale(LC_ALL, "sv_SE.UTF8"); // If using C
locale swedish("sv_SE.UTF8"); // C++
locale::global(swedish); // C++
cout.imbue(swedish); // C++; you probably want this for printing using cout

f = convertEnglishStringToDouble("1.2345", ...);
d = readEnglishStringToSomeSortOfDate("Fri Jul  5 05:04:02 2019", ...);
printStringInSwedish(f, ...); // prints 1,234500
printDateInSwedish(d, ...); // prints fre  5 jul 2019 05:04:02

Note

Standard naming conventions for reading and printing dates are strptime() and strftime(). Like strtod(), strptime() will fail if trying to read a string in the C locale while you are in a Swedish locale.

Note

While you are not allowed to use setlocale inside the program, you are allowed to use other functions that change the locale only for a single thread (change it back afterwards) or manipulate the locale for a single stream or function call.