Jannis Teunissen



This shows you the differences between two versions of the page.

Link to this comparison view

blog:computational_scientist [2019/05/03 21:07] (current)
Line 1: Line 1:
 +==== Becoming a computational scientist ====
 +//Note: the text below is copied from Appendix A of my [[publications:thesis|PhD
 +thesis]] from 2015.//
 +I have now been working on computational science and computational physics
 +problems for more than six years. What has surprised me, in hindsight, is the
 +great number of things that one has to be familiar with in order to be
 +productive. The reason is probably the relatively large amount of DIY (do it
 +yourself) in computing, compared to other disciplines. Below I will try to
 +summarize what skills I have found to be generally important.
 +=== Selecting problems ===
 +For doing research, the most important skill is perhaps the ability to pick the
 +right problems. 
 +This is a rather difficult skill to master, and I am confident that I have not
 +done so yet. 
 +Still, there are a couple of simple questions that I find useful for selecting
 +  * Are you interested in the problem?
 +  * Are others interested in the problem?
 +  * Do you expect to learn something useful when studying the problem?
 +  * Does the problem seem feasible to you? If this is not clear, how much time do you approximately have to invest to answer this?
 +  * Suppose that everything works out: you solve the problem. What would that mean to you? And what could you do next?
 +  * How long do you give yourself? And suppose that you are unable to solve the full problem, is there then an intermediate result that could be of value?
 +  * How hard will it be to write about the results? For example, for certain types of results a carefully written introduction, motivation, discussion or analysis might be required.
 +Another question that becomes more relevant towards the end of a PhD is whether
 +it is possible to obtain (future) grants or funding for a topic.
 +===Theoretical skills===
 +Below, I briefly discuss some of the theoretical topics that I believe to be
 +important for a computational scientist. 
 +The most important topic is missing however, namely knowledge of the domain that
 +you are working in. 
 +Such knowledge will help in selecting the right problems and in making the right
 +====Applied mathematics====
 +These are some of the topics in applied mathematics that I think are important
 +for a computational scientist:
 +  * Linear algebra: many problems can be written as a system of linear equations.
 +  * Calculus: ordinary differential equations and Taylor series are important for many numerical methods. Also helps for knowing what can be calculated analytically or for being able to construct reference solutions.
 +  * Statistics: Monte Carlo methods are quite common; to work with them, at least a basic understanding of statistics is required. The same goes for problems that are probabilistic in nature or contain data with noise.
 +====Computer science====
 +When we want to solve a problem on a computer, we have to select the appropriate
 +//algorithm//. Algorithms can be classified by their `difficulty' or
 +computational cost, which is the main topic of //computational complexity
 +theory//. Knowing and understanding the computational cost of algorithms is not
 +only important for efficiently solving a problem, but also for predicting what
 +problems are feasible. For example, if you recognize that you are trying to
 +solve an [[https://en.wikipedia.org/wiki/NP-hard|NP-hard]] problem, then you
 +immediately know that you are limited to small problem sizes. With parallel
 +computing, it is usually possible to go to larger problem sizes. To what extent
 +this is the case depends on how well the algorithmic components can be
 +parallelized, i.e., on the amount of local computation versus global
 +The practical cost of algorithms also has to do with the device that performs
 +the algorithmic steps or computations. Modern processors operate in a rather
 +complicated way, but knowledge of the cost of typical operations is important
 +when you have to develop an efficient numerical method. The hardware in a
 +processor also determines what integer and floating point numbers you can use.
 +Understanding floating point arithmetic and its subtleties can save you a lot of
 +time debugging `weird' behavior.
 +====Computational science====
 +Although there are many types of computations, most of them can be categorized
 +into just a few categories:
 +  * Solving linear systems of equations, i.e., solve $A x = b$ for a given matrix $A$ and vector $b$. Surprisingly many problems can be transformed into such a linear system.
 +  * Optimization, for example: find the shortest path between $N$ cities, find the ground state energy of a quantum system or find the minimum of a function.
 +  * Ordinary and partial differential equations. Many (physical) systems can be described by such equations. Different types of partial differential equations require quite different solution strategies.
 +A computational scientist should probably be familiar with the basic methods for
 +solving problems from these categories, so that one is able to find and select
 +the best methods when the need arises.
 +To prevent reinventing the wheel, some knowledge of the available libraries and
 +codes is valuable.
 +===Practical skills===
 +The best strategy for solving a problem depends on what tools are already
 +available. If sufficiently many other people have worked on a (similar) problem,
 +software might be available that you can directly use. Take for example CFD
 +(computational fluid dynamics), for which there are many different simulation
 +tools. Selecting the right one then becomes one of the most important aspects of
 +solving your problem.
 +The other extreme would be that no existing software exists for your problem, so
 +that you have to develop everything yourself. There are of course also many
 +cases in between, for example when existing tools have to be modified to suit
 +your needs. This means that it is often necessary to write computer code. Below,
 +some of the practical aspects of writing your own code and reusing others' code
 +are discussed.
 +====Computer basics====
 +For computing, the ''*nix'' operating systems appear to be most popular.
 +Being familiar with a variant of e.g., GNU/Linux, BSD or OS X is therefore quite
 +helpful -- this allows you to quickly use the code and tools that others have
 +Good command of a text editor such as ''vim'' or ''emacs'', or a suitable IDE
 +(integrated development environment) will speed up your code and text editing.
 +This might also reduce the risk of developing RSI (repetitive strain injury),
 +because most editors can be operated without a mouse ((In my experience, the
 +combination of stress and mouse usage is most likely to cause physical
 +discomfort.)). There are many useful tools included in a ''*nix'' system, but
 +''ssh'' gets a special mention, because it allows you to work on remote systems.
 +There exist a number of software suites for doing numerical or symbolic
 +computations. Commercial packages are for example Matlab and Mathematica,
 +whereas [[https://www.gnu.org/software/octave/|Octave]] or
 +[[http://www.sagemath.org/|SageMath]] are examples of free software
 +alternatives. The many built-in functions can help you to quickly develop a
 +computational method. Even if you eventually have to implement this solver in a
 +different environment, it can be helpful to start from a simple
 +proof-of-concept. The generality of such suites is also their drawback:
 +typically they will not be as efficient as a special purpose solution.
 +When you develop a method from scratch, you can use your preferred programming
 +language -- this is of course not possible when you have to modify an existing
 +method. The traditional languages for computing are C and Fortran. Especially C
 +is quite `low-level', so that experience with C will be useful for understanding
 +how a computer and other languages work. Fortran was specifically designed for
 +numerical computing, which can make code development more convenient. Another
 +popular //compiled// language is C++, which allows for many programming
 +styles. This flexibility can be good for the expert but is sometimes hard for
 +the beginner. Performance wise, there are no major differences between these
 +languages as long as you know what you are doing.
 +For certain tasks, scripting or interpreted languages such as Python can be more
 +Such languages can for example be used to glue together other programs, to
 +process data or to visualize results.
 +Python can also be used for computations, although the numerical work is then
 +typically performed by routines written in C or Fortran, which are made
 +available by Python modules such as ''numpy''.
 +Numerical code is no different from other code: many things can go wrong.
 +Sometimes a program simply does not compile or run, but at other times it might
 +not be clear whether there is a //bug// or whether there is a failure for
 +another reason. Code often depends on (particular versions of) libraries, which
 +is a source of compilation errors; understanding how code is compiled will help
 +in figuring out what is required. Another example are the
 +//Makefiles//((Makefiles contain rules that describe how a collection of source
 +files should be compiled. Another common build system is ''CMake''.)) included
 +with numerical software: they might not work on your machine, in which case you
 +need to know how to modify them. As most programs contain bugs, basic debugging
 +skills are very valuable. The larger a project grows, the more important these
 +skills become.
 +Being familiar with a version control system such as git has various benefits:
 +you can keep tracks of your changes, get the latest version of a code,
 +collaborate with others etcetera. Perhaps even more important is being able to
 +visualize your results. There exist many tools for this, examples of popular
 +open source packages are gnuplot, [[https://visit.llnl.gov/|Visit]] and