This shows you the differences between two versions of the page.

— |
blog:computational_scientist [2019/05/03 21:07] (current) |
||
---|---|---|---|

Line 1: | Line 1: | ||

+ | ==== Becoming a computational scientist ==== | ||

+ | |||

+ | //Note: the text below is copied from Appendix A of my [[publications:thesis|PhD | ||

+ | thesis]] from 2015.// | ||

+ | |||

+ | I have now been working on computational science and computational physics | ||

+ | problems for more than six years. What has surprised me, in hindsight, is the | ||

+ | great number of things that one has to be familiar with in order to be | ||

+ | productive. The reason is probably the relatively large amount of DIY (do it | ||

+ | yourself) in computing, compared to other disciplines. Below I will try to | ||

+ | summarize what skills I have found to be generally important. | ||

+ | |||

+ | === Selecting problems === | ||

+ | |||

+ | For doing research, the most important skill is perhaps the ability to pick the | ||

+ | right problems. | ||

+ | This is a rather difficult skill to master, and I am confident that I have not | ||

+ | done so yet. | ||

+ | Still, there are a couple of simple questions that I find useful for selecting | ||

+ | problems: | ||

+ | |||

+ | * Are you interested in the problem? | ||

+ | * Are others interested in the problem? | ||

+ | * Do you expect to learn something useful when studying the problem? | ||

+ | * Does the problem seem feasible to you? If this is not clear, how much time do you approximately have to invest to answer this? | ||

+ | * Suppose that everything works out: you solve the problem. What would that mean to you? And what could you do next? | ||

+ | * How long do you give yourself? And suppose that you are unable to solve the full problem, is there then an intermediate result that could be of value? | ||

+ | * How hard will it be to write about the results? For example, for certain types of results a carefully written introduction, motivation, discussion or analysis might be required. | ||

+ | |||

+ | Another question that becomes more relevant towards the end of a PhD is whether | ||

+ | it is possible to obtain (future) grants or funding for a topic. | ||

+ | |||

+ | ===Theoretical skills=== | ||

+ | |||

+ | Below, I briefly discuss some of the theoretical topics that I believe to be | ||

+ | important for a computational scientist. | ||

+ | The most important topic is missing however, namely knowledge of the domain that | ||

+ | you are working in. | ||

+ | Such knowledge will help in selecting the right problems and in making the right | ||

+ | approximations. | ||

+ | |||

+ | ====Applied mathematics==== | ||

+ | |||

+ | These are some of the topics in applied mathematics that I think are important | ||

+ | for a computational scientist: | ||

+ | |||

+ | * Linear algebra: many problems can be written as a system of linear equations. | ||

+ | * Calculus: ordinary differential equations and Taylor series are important for many numerical methods. Also helps for knowing what can be calculated analytically or for being able to construct reference solutions. | ||

+ | * Statistics: Monte Carlo methods are quite common; to work with them, at least a basic understanding of statistics is required. The same goes for problems that are probabilistic in nature or contain data with noise. | ||

+ | |||

+ | ====Computer science==== | ||

+ | |||

+ | When we want to solve a problem on a computer, we have to select the appropriate | ||

+ | //algorithm//. Algorithms can be classified by their `difficulty' or | ||

+ | computational cost, which is the main topic of //computational complexity | ||

+ | theory//. Knowing and understanding the computational cost of algorithms is not | ||

+ | only important for efficiently solving a problem, but also for predicting what | ||

+ | problems are feasible. For example, if you recognize that you are trying to | ||

+ | solve an [[https://en.wikipedia.org/wiki/NP-hard|NP-hard]] problem, then you | ||

+ | immediately know that you are limited to small problem sizes. With parallel | ||

+ | computing, it is usually possible to go to larger problem sizes. To what extent | ||

+ | this is the case depends on how well the algorithmic components can be | ||

+ | parallelized, i.e., on the amount of local computation versus global | ||

+ | communication. | ||

+ | |||

+ | The practical cost of algorithms also has to do with the device that performs | ||

+ | the algorithmic steps or computations. Modern processors operate in a rather | ||

+ | complicated way, but knowledge of the cost of typical operations is important | ||

+ | when you have to develop an efficient numerical method. The hardware in a | ||

+ | processor also determines what integer and floating point numbers you can use. | ||

+ | Understanding floating point arithmetic and its subtleties can save you a lot of | ||

+ | time debugging `weird' behavior. | ||

+ | |||

+ | ====Computational science==== | ||

+ | |||

+ | Although there are many types of computations, most of them can be categorized | ||

+ | into just a few categories: | ||

+ | |||

+ | * Solving linear systems of equations, i.e., solve $A x = b$ for a given matrix $A$ and vector $b$. Surprisingly many problems can be transformed into such a linear system. | ||

+ | * Optimization, for example: find the shortest path between $N$ cities, find the ground state energy of a quantum system or find the minimum of a function. | ||

+ | * Ordinary and partial differential equations. Many (physical) systems can be described by such equations. Different types of partial differential equations require quite different solution strategies. | ||

+ | |||

+ | A computational scientist should probably be familiar with the basic methods for | ||

+ | solving problems from these categories, so that one is able to find and select | ||

+ | the best methods when the need arises. | ||

+ | To prevent reinventing the wheel, some knowledge of the available libraries and | ||

+ | codes is valuable. | ||

+ | |||

+ | ===Practical skills=== | ||

+ | |||

+ | The best strategy for solving a problem depends on what tools are already | ||

+ | available. If sufficiently many other people have worked on a (similar) problem, | ||

+ | software might be available that you can directly use. Take for example CFD | ||

+ | (computational fluid dynamics), for which there are many different simulation | ||

+ | tools. Selecting the right one then becomes one of the most important aspects of | ||

+ | solving your problem. | ||

+ | |||

+ | The other extreme would be that no existing software exists for your problem, so | ||

+ | that you have to develop everything yourself. There are of course also many | ||

+ | cases in between, for example when existing tools have to be modified to suit | ||

+ | your needs. This means that it is often necessary to write computer code. Below, | ||

+ | some of the practical aspects of writing your own code and reusing others' code | ||

+ | are discussed. | ||

+ | |||

+ | ====Computer basics==== | ||

+ | |||

+ | For computing, the ''*nix'' operating systems appear to be most popular. | ||

+ | Being familiar with a variant of e.g., GNU/Linux, BSD or OS X is therefore quite | ||

+ | helpful -- this allows you to quickly use the code and tools that others have | ||

+ | written. | ||

+ | |||

+ | Good command of a text editor such as ''vim'' or ''emacs'', or a suitable IDE | ||

+ | (integrated development environment) will speed up your code and text editing. | ||

+ | This might also reduce the risk of developing RSI (repetitive strain injury), | ||

+ | because most editors can be operated without a mouse ((In my experience, the | ||

+ | combination of stress and mouse usage is most likely to cause physical | ||

+ | discomfort.)). There are many useful tools included in a ''*nix'' system, but | ||

+ | ''ssh'' gets a special mention, because it allows you to work on remote systems. | ||

+ | |||

+ | There exist a number of software suites for doing numerical or symbolic | ||

+ | computations. Commercial packages are for example Matlab and Mathematica, | ||

+ | whereas [[https://www.gnu.org/software/octave/|Octave]] or | ||

+ | [[http://www.sagemath.org/|SageMath]] are examples of free software | ||

+ | alternatives. The many built-in functions can help you to quickly develop a | ||

+ | computational method. Even if you eventually have to implement this solver in a | ||

+ | different environment, it can be helpful to start from a simple | ||

+ | proof-of-concept. The generality of such suites is also their drawback: | ||

+ | typically they will not be as efficient as a special purpose solution. | ||

+ | |||

+ | ====Programming==== | ||

+ | |||

+ | When you develop a method from scratch, you can use your preferred programming | ||

+ | language -- this is of course not possible when you have to modify an existing | ||

+ | method. The traditional languages for computing are C and Fortran. Especially C | ||

+ | is quite `low-level', so that experience with C will be useful for understanding | ||

+ | how a computer and other languages work. Fortran was specifically designed for | ||

+ | numerical computing, which can make code development more convenient. Another | ||

+ | popular //compiled// language is C++, which allows for many programming | ||

+ | styles. This flexibility can be good for the expert but is sometimes hard for | ||

+ | the beginner. Performance wise, there are no major differences between these | ||

+ | languages as long as you know what you are doing. | ||

+ | |||

+ | For certain tasks, scripting or interpreted languages such as Python can be more | ||

+ | convenient. | ||

+ | Such languages can for example be used to glue together other programs, to | ||

+ | process data or to visualize results. | ||

+ | Python can also be used for computations, although the numerical work is then | ||

+ | typically performed by routines written in C or Fortran, which are made | ||

+ | available by Python modules such as ''numpy''. | ||

+ | |||

+ | Numerical code is no different from other code: many things can go wrong. | ||

+ | Sometimes a program simply does not compile or run, but at other times it might | ||

+ | not be clear whether there is a //bug// or whether there is a failure for | ||

+ | another reason. Code often depends on (particular versions of) libraries, which | ||

+ | is a source of compilation errors; understanding how code is compiled will help | ||

+ | in figuring out what is required. Another example are the | ||

+ | //Makefiles//((Makefiles contain rules that describe how a collection of source | ||

+ | files should be compiled. Another common build system is ''CMake''.)) included | ||

+ | with numerical software: they might not work on your machine, in which case you | ||

+ | need to know how to modify them. As most programs contain bugs, basic debugging | ||

+ | skills are very valuable. The larger a project grows, the more important these | ||

+ | skills become. | ||

+ | |||

+ | Being familiar with a version control system such as git has various benefits: | ||

+ | you can keep tracks of your changes, get the latest version of a code, | ||

+ | collaborate with others etcetera. Perhaps even more important is being able to | ||

+ | visualize your results. There exist many tools for this, examples of popular | ||

+ | open source packages are gnuplot, [[https://visit.llnl.gov/|Visit]] and | ||

+ | [[https://www.paraview.org/|Paraview]]. | ||