Jannis Teunissen

# Differences

This shows you the differences between two versions of the page.

 — blog:computational_scientist [2019/05/03 21:07] (current) Line 1: Line 1: + ==== Becoming a computational scientist ==== + + //Note: the text below is copied from Appendix A of my [[publications:thesis|PhD + thesis]] from 2015.// + + I have now been working on computational science and computational physics + problems for more than six years. What has surprised me, in hindsight, is the + great number of things that one has to be familiar with in order to be + productive. The reason is probably the relatively large amount of DIY (do it + yourself) in computing, compared to other disciplines. Below I will try to + summarize what skills I have found to be generally important. + + === Selecting problems === + + For doing research, the most important skill is perhaps the ability to pick the + right problems. + This is a rather difficult skill to master, and I am confident that I have not + done so yet. + Still, there are a couple of simple questions that I find useful for selecting + problems: + + * Are you interested in the problem? + * Are others interested in the problem? + * Do you expect to learn something useful when studying the problem? + * Does the problem seem feasible to you? If this is not clear, how much time do you approximately have to invest to answer this? + * Suppose that everything works out: you solve the problem. What would that mean to you? And what could you do next? + * How long do you give yourself? And suppose that you are unable to solve the full problem, is there then an intermediate result that could be of value? + * How hard will it be to write about the results? For example, for certain types of results a carefully written introduction, motivation, discussion or analysis might be required. + + Another question that becomes more relevant towards the end of a PhD is whether + it is possible to obtain (future) grants or funding for a topic. + + ===Theoretical skills=== + + Below, I briefly discuss some of the theoretical topics that I believe to be + important for a computational scientist. + The most important topic is missing however, namely knowledge of the domain that + you are working in. + Such knowledge will help in selecting the right problems and in making the right + approximations. + + ====Applied mathematics==== + + These are some of the topics in applied mathematics that I think are important + for a computational scientist: + + * Linear algebra: many problems can be written as a system of linear equations. + * Calculus: ordinary differential equations and Taylor series are important for many numerical methods. Also helps for knowing what can be calculated analytically or for being able to construct reference solutions. + * Statistics: Monte Carlo methods are quite common; to work with them, at least a basic understanding of statistics is required. The same goes for problems that are probabilistic in nature or contain data with noise. + + ====Computer science==== + + When we want to solve a problem on a computer, we have to select the appropriate + //algorithm//. Algorithms can be classified by their difficulty' or + computational cost, which is the main topic of //computational complexity + theory//. Knowing and understanding the computational cost of algorithms is not + only important for efficiently solving a problem, but also for predicting what + problems are feasible. For example, if you recognize that you are trying to + solve an [[https://en.wikipedia.org/wiki/NP-hard|NP-hard]] problem, then you + immediately know that you are limited to small problem sizes. With parallel + computing, it is usually possible to go to larger problem sizes. To what extent + this is the case depends on how well the algorithmic components can be + parallelized, i.e., on the amount of local computation versus global + communication. + + The practical cost of algorithms also has to do with the device that performs + the algorithmic steps or computations. Modern processors operate in a rather + complicated way, but knowledge of the cost of typical operations is important + when you have to develop an efficient numerical method. The hardware in a + processor also determines what integer and floating point numbers you can use. + Understanding floating point arithmetic and its subtleties can save you a lot of + time debugging weird' behavior. + + ====Computational science==== + + Although there are many types of computations, most of them can be categorized + into just a few categories: + + * Solving linear systems of equations, i.e., solve $A x = b$ for a given matrix $A$ and vector $b$. Surprisingly many problems can be transformed into such a linear system. + * Optimization, for example: find the shortest path between $N$ cities, find the ground state energy of a quantum system or find the minimum of a function. + * Ordinary and partial differential equations. Many (physical) systems can be described by such equations. Different types of partial differential equations require quite different solution strategies. + + A computational scientist should probably be familiar with the basic methods for + solving problems from these categories, so that one is able to find and select + the best methods when the need arises. + To prevent reinventing the wheel, some knowledge of the available libraries and + codes is valuable. + + ===Practical skills=== + + The best strategy for solving a problem depends on what tools are already + available. If sufficiently many other people have worked on a (similar) problem, + software might be available that you can directly use. Take for example CFD + (computational fluid dynamics), for which there are many different simulation + tools. Selecting the right one then becomes one of the most important aspects of + solving your problem. + + The other extreme would be that no existing software exists for your problem, so + that you have to develop everything yourself. There are of course also many + cases in between, for example when existing tools have to be modified to suit + your needs. This means that it is often necessary to write computer code. Below, + some of the practical aspects of writing your own code and reusing others' code + are discussed. + + ====Computer basics==== + + For computing, the ''*nix'' operating systems appear to be most popular. + Being familiar with a variant of e.g., GNU/Linux, BSD or OS X is therefore quite + helpful -- this allows you to quickly use the code and tools that others have + written. + + Good command of a text editor such as ''vim'' or ''emacs'', or a suitable IDE + (integrated development environment) will speed up your code and text editing. + This might also reduce the risk of developing RSI (repetitive strain injury), + because most editors can be operated without a mouse ((In my experience, the + combination of stress and mouse usage is most likely to cause physical + discomfort.)). There are many useful tools included in a ''*nix'' system, but + ''ssh'' gets a special mention, because it allows you to work on remote systems. + + There exist a number of software suites for doing numerical or symbolic + computations. Commercial packages are for example Matlab and Mathematica, + whereas [[https://www.gnu.org/software/octave/|Octave]] or + [[http://www.sagemath.org/|SageMath]] are examples of free software + alternatives. The many built-in functions can help you to quickly develop a + computational method. Even if you eventually have to implement this solver in a + different environment, it can be helpful to start from a simple + proof-of-concept. The generality of such suites is also their drawback: + typically they will not be as efficient as a special purpose solution. + + ====Programming==== + + When you develop a method from scratch, you can use your preferred programming + language -- this is of course not possible when you have to modify an existing + method. The traditional languages for computing are C and Fortran. Especially C + is quite `low-level', so that experience with C will be useful for understanding + how a computer and other languages work. Fortran was specifically designed for + numerical computing, which can make code development more convenient. Another + popular //compiled// language is C++, which allows for many programming + styles. This flexibility can be good for the expert but is sometimes hard for + the beginner. Performance wise, there are no major differences between these + languages as long as you know what you are doing. + + For certain tasks, scripting or interpreted languages such as Python can be more + convenient. + Such languages can for example be used to glue together other programs, to + process data or to visualize results. + Python can also be used for computations, although the numerical work is then + typically performed by routines written in C or Fortran, which are made + available by Python modules such as ''numpy''. + + Numerical code is no different from other code: many things can go wrong. + Sometimes a program simply does not compile or run, but at other times it might + not be clear whether there is a //bug// or whether there is a failure for + another reason. Code often depends on (particular versions of) libraries, which + is a source of compilation errors; understanding how code is compiled will help + in figuring out what is required. Another example are the + //Makefiles//((Makefiles contain rules that describe how a collection of source + files should be compiled. Another common build system is ''CMake''.)) included + with numerical software: they might not work on your machine, in which case you + need to know how to modify them. As most programs contain bugs, basic debugging + skills are very valuable. The larger a project grows, the more important these + skills become. + + Being familiar with a version control system such as git has various benefits: + you can keep tracks of your changes, get the latest version of a code, + collaborate with others etcetera. Perhaps even more important is being able to + visualize your results. There exist many tools for this, examples of popular + open source packages are gnuplot, [[https://visit.llnl.gov/|Visit]] and + [[https://www.paraview.org/|Paraview]].