Stylistic advice to my students for writing a thesis

by Christoph Kessler, IDA, Linköping University, Sweden

This is a list of comments on some stylistic issues for master theses and similar documents that I am supposed to examinate.
Most of these were compiled from a collection of frequently encountered stylistic errors in previous master theses.
My suggestions are based on scientific writing conventions used by international publishers in computer science and engineering.

Most of these points concern both English and Swedish.

As a general reference for proper writing in English, I recommend
Heffernan, Lincoln, Atwill: Writing, A College Handbook. 5th ed., Norton, 2001
or similar books.

General issues

What is Related Work?

Unfortunately, many students seem to confuse background literature with related work. The latter only applies to studies that try to answer the same or similar (related) research questions as your project, i.e., these are your "competitors" that you will compare your results with. In particular, manuals, textbooks, survey articles etc. for theory, systems and techniques that you (and likewise your competitors) use for the work are not related work (but should nevertheless be cited). These are background literature.
For the related work section in your thesis, you should consider, cite and discuss published approaches that have been done in the same or a slightly different problem context, e.g., investigated the same question for a different processor architecture, programming language, or for a variant of the algorithmic problem considered that is, for example, more general or more special than yours. Then you can relate your own experimental evaluations and conclusions to those works and explain the similarities and differences, strengths and weaknesses, thereby leading to a proper related work discussion chapter in your thesis and possibly stronger conclusions. The related work chapter should be at least 3-4 pages long.

Literature search for a final thesis project

I expect that you cite, and compare your work to, at least 5 (at least 10 for two-student theses) related-work scientific papers from peer-reviewed conferences or journals, in addition to background literature, method papers, textbooks, manuals, technical reports, web pages etc.
In addition, my department expects at least 7, better at least 10 references to peer-reviewed publications in a master thesis. This is a lower bound. Good master theses typically exceed this manyfold.

I do not allow citing (and thus using material from) Wikipedia and other public web lexica, blogs etc. in final thesis reports. Such material is too volatile, not generally trustworthy and the presentation may be biased. For instance, I recently traced an error made in an exam by several students back to an erroneous article in such a web lexicon.
Wikipedia may still be useful as a starting (!) point for a first literature search about a topic, but you should read and cite the original articles mentioned there (and of course also others that are not mentioned by that article's authors).

Do not consider and do not cite papers from predatory conferences or journals. In the same way as your work gains in trust from citing strong, well-published papers and building upon them, citing papers from predatory conferences or journals add "smell" to your work. At least, you will have to show awareness of such potential weaknesses where you discuss the quality of the literature sources used in your thesis.

Don't use copyrighted material without permission

The copyright holder (e.g., publisher) can sue you for economic compensation otherwise.

Stylistic issues

Pluralis majestatis

Never use "I" (except, maybe, in the preface where you sign with your name). Instead, use the so-called pluralis majestatis, that is, "WE". "We" is used in the sense of "the author(s)" and sometimes also in the sense of "the reader and the author together as a team".

Examples: "We have shown in previous work [3,4] that..." - "We will see that..." - "Let us consider now ..."

If you want to mark something as your own personal opinion or experience, you should speak of yourself in the third person:
"The author thinks that ..."

Never use "you" to directly address the reader, and never command the reader by using imperative form. Also here the third person is appropriate to keep the necessary distance:
"The reader may have noticed that ..." - "We refer the interested reader to ..."

Descriptive title

The title is the only piece of information that may really be read during a quick literature search. It should be in your own interest that your thesis is found by interested readers, as the number of papers citing you may be used as a measure of your scientific quality (even if this does not always make sense, but nowadays people are crazy about evaluating and measuring everything).

Hence, choose a descriptive title without abbreviations, even if the title becomes a bit longer.

Positive example: Parallelization of the interpreter in a test system for mobile telecommunication components

Orthography

Your thesis is a publication, with your name on it, that will exist forever. Hence, the thesis is not just another examination moment. It is a mirror of your personal way of working in daily life (just like your CV document and job application letters, by the way). Regardless if you work in industry or academia, you will have to write many technical documents in your professional life. In the same way as an application letter, your thesis tells the experienced reader (such as a potential employer or a reviewer) a lot about your attitude towards quality and accuracy, for example.

In your own interest, I am fussy about orthography, grammar, and consistency of terminology, and I will reject a thesis with more than 15 typos on a page.

Please use a spell checker and, if possible, ask someone else (not your opponent) to proofread your thesis before you hand it in to me.

Active tense

If possible, prefer active to passive tense. You will find out that active tense forces you to be more precise, which is always good. For instance, one could write

"It was shown that this problem is NP-complete."

If you use active tense instead, you will be forced to name the author:

"Cook [23] has shown that this problem is NP-complete."

Colloquialisms and Contractions

Scientific publications and theses use formal language. Avoid colloquial phrases. Avoid contractions such as "can't", "doesn't", "let's" etc. This more conversational style is ok for blogs, tutorials and columns in popular magazines and newspapers, but not for scientific papers and theses. Use "cannot", "does not", "let us" instead.

Length

Material that is only of interest to very special readers (e.g., to your successors in the same project), such as commented listings, complete language grammars, or JavaDoc excerpts, should be moved to an appendix.

Structure

Structure your thesis into 5-10 chapters, each of which addresses a certain aspect of your work. Try to minimize cross-chapter crossreferences. Avoid very short chapters or sections - these could be a sign that your structure is not very suitable.

An additional clustering of the chapters into multiple parts may be suitable for textbooks with more than 400 pages, but never for a thesis.

Sections and subsections are usually numbered up to a nesting depth of 3 or 4. Hence, "Section 3.1.4" looks still fine, while "Section 3.1.5.1.2.1.1" does not.
LaTeX automatically enforces this standard.

Keep it simple and short.

Avoid long sentences. Do not squeeze in more than one thought into one sentence.

Avoid phrases that do not really add information. Keep the quality and precision of the presentation but minimize the length.

Complete sentences

Build complete sentences! A sentence without a predicate is incomplete and a sign of bad style in a scientific publication. Your thesis is not a collection of loose thoughts.

Bad example: Just this.

This rule applies also to enumerations. Hence, avoid incomplete sentences as in

Memory types:
* Data memory. Contains...
* Program memory. Is ...

instead, you should write

The memory types are the following:
* Data memory. The data memory contains...
* Program memory. The program memory is ...

Quotes

Avoid quotes unless this is absolutely necessary for the description of your work. If you quote, give author name and citation.

Abbreviations

A human reader who is not really well familiar with your project area can hardly keep more than 3 or 4 abbreviations in mind. The massive use of abbreviations, which is particularly dramatic in IT (sorry, should read: information technology) industry, makes a thesis hard to read. Note also that, even within the same subarea of computer science, abbreviations may be ambiguous, such as ILP (integer linear programming, instruction-level parallelism).

Hence, choose a few very central abbreviations that you want to keep, introduce them with their full name properly before their first use, and spell out the others.

Interpunction and spaces

Please put a blank space after the dot concluding a sentence. Not like this.This looks ugly and makes the text harder to read. The same rule also holds for commas, colons, semicolons etc. Citation symbols should appear as close as possible to the term [47] they refer to.
If you have a citation at the end of a (sub-)sentence like this one [5], the subsequent comma or period comes after the citation symbol [13].
There is always a blank space before a citation symbol.

Algorithms

Compact pseudocode (see your algorithms course book) is preferred to lengthy source code in Java or similar verbose programming languages.

Long listings in verbatim mode (typewriter font) are hard to read. I recommend to use the available stylistic features to structure program code, such as boldface font for key words, italics for variable and function names, slanted for comments, etc. LaTeX has good packages for typesetting algorithms.
Avoid overlong names for variables and constants, especially if you need to reference them in mathematical formulas. Line numbers should only be used if the describing plain text explicitly references them.

Avoid page breaks in algorithms / listings. Treat them as floating objects, like tables and figures.

Avoid excessively long algorithms or program listings. Partition the algorithm into suitable parts and package the parts into separate listings/figures.

If you present an algorithm that you invented,

Referencing chapters, sections, figures, tables, algorithms, etc.

"see Figure 3 for ..." - "as shown in Section 2.3"
Here, "Figure 3" is a proper name and thus the first character is capitalized.

but:
"see the third figure ..." - "see the previous section"
Here, "figure" is an ordinary noun.

Referencing literature

There are various styles of citations used in standard literature. The most common ones are numeric [13], alphanumeric [Ke92,KS88,KS88a] and name/year, such as Kessler (1998).
(For Word users: Never include first names of authors nor publication titles in the citation in the running text, these should only appear in the References section. Describe the main contributions of a cited work with your own words instead.)

Note that numeric citations are to be treated as comments (like footnote superscripts), not as independent text objects. I also recommend to add some text that simplifies reading without looking up the corresponding bibitem in the bibliography.

Hence, you may write
"Applying Brent's theorem [7] yields ..." or "see Cormen et al. [4]"

but not
"[7] yields ..." nor "see [4]"
as [7] by itself is just an optional comment, not a subject.

A minor issue is that multiple numeric references should appear in order, that is, [3,4] instead of [4,3]. As the bibliography should be sorted in alphabetical order (of the first author's last name), this means citation symbols should preserve this alphabetical ordering, too.

Long flat lists of references are not a good citation style. For instance,
"A lot of loop transformations have been developed to increase data locality, for example [3], [5], [6], [7], [10], [11], [12], [13], [16], [17], [20], [21], [22]."
Instead, more effort should be put in reviewing and structuring the related work, even if this needs more time and space:
"Many loop transformations have been proposed in the literature of the last two decades. For instance, Kennedy et al. [10] give an in-depth treatment of loop interchange. Tiling of multidimensional loops is discussed in a seminal paper by Wolf and Lam [21]. Polychronopoulos [13] proposes cycle shrinking for loops with dependence cycles of static distances larger than one. Banerjee [5,6] gives an introduction to the theory of unimodular transformations. Ancourt and Irigoin [3] introduce the polytope model for the representation of index spaces, which is used in subsequent work by Lengauer [16], Xue [22], ... . See also Wolfe's textbook [20] for a comprehensive overview."

Please note also that there should be a blank space before each citation reference symbol, such as here [47] (in the LaTeX source: before each \cite{} command).
If the citation reference symbol comes last in a sentence, the period comes directly after it [19]. Never put a citation after the last period at the end of a paragraph, this has a strong smell of unreflected copy-and-paste (or worse, of plagiarism) of the whole paragraph.

Bibliography

The bibitems in the bibliography should be alphabetically ordered by author names. Alternatively, they may be ordered in the order they appear in the text (that is the unsrt bibstyle in LaTeX), but that makes it more difficult to look up bibitems that are referenced multiple times.

Avoid citing web documents, give preference to properly published (e.g., printed) books and articles. Note that web documents are (with very few exceptions) highly volatile, subject to dynamic update or removal without notice, and generally unrefereed; hence they are much less trustworthy than printed publications, which underwent a thorough reviewing process and can thus be regarded as correct and original work (also with very few exceptions). My own experience (admittedly, based on a small sample size) shows that, on the average, a significant share of cited web page URLs have changed or do no longer exist after a few years. A printed book (with an ISBN) or a journal article exist, in principle, forever. Likewise, the DOI for a published paper is expected to be valid for a long time.

For each (non-web) bibitem, give the author, title, the year of publication, and the publisher. The ISBN is optional and usually dropped. This is standard for scientific writing. Note that I override here the IDA guidelines.

For articles, give also journal name, volume, issue, and page numbers.
For papers in conference proceedings, give the conference name and the page numbers. For theses, technical reports and white papers, give the responsible institution/organization's name and location.

If you cannot avoid citing a web document (such as online documentation), give the author/organization, the year, and the location of the project (e.g., university or research institute).

By the way, the publisher information tells the experienced reader a lot about the trustworthyness of a publication (and thus, whether it is worth the effort to get access to it).

LaTeX

I recommend you to learn and use LaTeX. It is more powerful and forces you to follow stylistic and typographic standards better than any WYSIWYG word processor ever can. You will find LaTeX easy to learn if you know an arbitrary programming language.
Using LaTeX is even more or less mandatory if you are a graduate student in computer science and engineering.

To be continued...

Christoph Kessler