This shows you the differences between two versions of the page.
— | blog:computational_scientist [2019/05/03 21:07] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ==== Becoming a computational scientist ==== | ||
+ | |||
+ | //Note: the text below is copied from Appendix A of my [[publications: | ||
+ | thesis]] from 2015.// | ||
+ | |||
+ | I have now been working on computational science and computational physics | ||
+ | problems for more than six years. What has surprised me, in hindsight, is the | ||
+ | great number of things that one has to be familiar with in order to be | ||
+ | productive. The reason is probably the relatively large amount of DIY (do it | ||
+ | yourself) in computing, compared to other disciplines. Below I will try to | ||
+ | summarize what skills I have found to be generally important. | ||
+ | |||
+ | === Selecting problems === | ||
+ | |||
+ | For doing research, the most important skill is perhaps the ability to pick the | ||
+ | right problems. | ||
+ | This is a rather difficult skill to master, and I am confident that I have not | ||
+ | done so yet. | ||
+ | Still, there are a couple of simple questions that I find useful for selecting | ||
+ | problems: | ||
+ | |||
+ | * Are you interested in the problem? | ||
+ | * Are others interested in the problem? | ||
+ | * Do you expect to learn something useful when studying the problem? | ||
+ | * Does the problem seem feasible to you? If this is not clear, how much time do you approximately have to invest to answer this? | ||
+ | * Suppose that everything works out: you solve the problem. What would that mean to you? And what could you do next? | ||
+ | * How long do you give yourself? And suppose that you are unable to solve the full problem, is there then an intermediate result that could be of value? | ||
+ | * How hard will it be to write about the results? For example, for certain types of results a carefully written introduction, | ||
+ | |||
+ | Another question that becomes more relevant towards the end of a PhD is whether | ||
+ | it is possible to obtain (future) grants or funding for a topic. | ||
+ | |||
+ | ===Theoretical skills=== | ||
+ | |||
+ | Below, I briefly discuss some of the theoretical topics that I believe to be | ||
+ | important for a computational scientist. | ||
+ | The most important topic is missing however, namely knowledge of the domain that | ||
+ | you are working in. | ||
+ | Such knowledge will help in selecting the right problems and in making the right | ||
+ | approximations. | ||
+ | |||
+ | ====Applied mathematics==== | ||
+ | |||
+ | These are some of the topics in applied mathematics that I think are important | ||
+ | for a computational scientist: | ||
+ | |||
+ | * Linear algebra: many problems can be written as a system of linear equations. | ||
+ | * Calculus: ordinary differential equations and Taylor series are important for many numerical methods. Also helps for knowing what can be calculated analytically or for being able to construct reference solutions. | ||
+ | * Statistics: Monte Carlo methods are quite common; to work with them, at least a basic understanding of statistics is required. The same goes for problems that are probabilistic in nature or contain data with noise. | ||
+ | |||
+ | ====Computer science==== | ||
+ | |||
+ | When we want to solve a problem on a computer, we have to select the appropriate | ||
+ | // | ||
+ | computational cost, which is the main topic of // | ||
+ | theory//. Knowing and understanding the computational cost of algorithms is not | ||
+ | only important for efficiently solving a problem, but also for predicting what | ||
+ | problems are feasible. For example, if you recognize that you are trying to | ||
+ | solve an [[https:// | ||
+ | immediately know that you are limited to small problem sizes. With parallel | ||
+ | computing, it is usually possible to go to larger problem sizes. To what extent | ||
+ | this is the case depends on how well the algorithmic components can be | ||
+ | parallelized, | ||
+ | communication. | ||
+ | |||
+ | The practical cost of algorithms also has to do with the device that performs | ||
+ | the algorithmic steps or computations. Modern processors operate in a rather | ||
+ | complicated way, but knowledge of the cost of typical operations is important | ||
+ | when you have to develop an efficient numerical method. The hardware in a | ||
+ | processor also determines what integer and floating point numbers you can use. | ||
+ | Understanding floating point arithmetic and its subtleties can save you a lot of | ||
+ | time debugging `weird' | ||
+ | |||
+ | ====Computational science==== | ||
+ | |||
+ | Although there are many types of computations, | ||
+ | into just a few categories: | ||
+ | |||
+ | * Solving linear systems of equations, i.e., solve $A x = b$ for a given matrix $A$ and vector $b$. Surprisingly many problems can be transformed into such a linear system. | ||
+ | * Optimization, | ||
+ | * Ordinary and partial differential equations. Many (physical) systems can be described by such equations. Different types of partial differential equations require quite different solution strategies. | ||
+ | |||
+ | A computational scientist should probably be familiar with the basic methods for | ||
+ | solving problems from these categories, so that one is able to find and select | ||
+ | the best methods when the need arises. | ||
+ | To prevent reinventing the wheel, some knowledge of the available libraries and | ||
+ | codes is valuable. | ||
+ | |||
+ | ===Practical skills=== | ||
+ | |||
+ | The best strategy for solving a problem depends on what tools are already | ||
+ | available. If sufficiently many other people have worked on a (similar) problem, | ||
+ | software might be available that you can directly use. Take for example CFD | ||
+ | (computational fluid dynamics), for which there are many different simulation | ||
+ | tools. Selecting the right one then becomes one of the most important aspects of | ||
+ | solving your problem. | ||
+ | |||
+ | The other extreme would be that no existing software exists for your problem, so | ||
+ | that you have to develop everything yourself. There are of course also many | ||
+ | cases in between, for example when existing tools have to be modified to suit | ||
+ | your needs. This means that it is often necessary to write computer code. Below, | ||
+ | some of the practical aspects of writing your own code and reusing others' | ||
+ | are discussed. | ||
+ | |||
+ | ====Computer basics==== | ||
+ | |||
+ | For computing, the '' | ||
+ | Being familiar with a variant of e.g., GNU/Linux, BSD or OS X is therefore quite | ||
+ | helpful -- this allows you to quickly use the code and tools that others have | ||
+ | written. | ||
+ | |||
+ | Good command of a text editor such as '' | ||
+ | (integrated development environment) will speed up your code and text editing. | ||
+ | This might also reduce the risk of developing RSI (repetitive strain injury), | ||
+ | because most editors can be operated without a mouse ((In my experience, the | ||
+ | combination of stress and mouse usage is most likely to cause physical | ||
+ | discomfort.)). There are many useful tools included in a '' | ||
+ | '' | ||
+ | |||
+ | There exist a number of software suites for doing numerical or symbolic | ||
+ | computations. Commercial packages are for example Matlab and Mathematica, | ||
+ | whereas [[https:// | ||
+ | [[http:// | ||
+ | alternatives. The many built-in functions can help you to quickly develop a | ||
+ | computational method. Even if you eventually have to implement this solver in a | ||
+ | different environment, | ||
+ | proof-of-concept. The generality of such suites is also their drawback: | ||
+ | typically they will not be as efficient as a special purpose solution. | ||
+ | |||
+ | ====Programming==== | ||
+ | |||
+ | When you develop a method from scratch, you can use your preferred programming | ||
+ | language -- this is of course not possible when you have to modify an existing | ||
+ | method. The traditional languages for computing are C and Fortran. Especially C | ||
+ | is quite `low-level', | ||
+ | how a computer and other languages work. Fortran was specifically designed for | ||
+ | numerical computing, which can make code development more convenient. Another | ||
+ | popular // | ||
+ | styles. This flexibility can be good for the expert but is sometimes hard for | ||
+ | the beginner. Performance wise, there are no major differences between these | ||
+ | languages as long as you know what you are doing. | ||
+ | |||
+ | For certain tasks, scripting or interpreted languages such as Python can be more | ||
+ | convenient. | ||
+ | Such languages can for example be used to glue together other programs, to | ||
+ | process data or to visualize results. | ||
+ | Python can also be used for computations, | ||
+ | typically performed by routines written in C or Fortran, which are made | ||
+ | available by Python modules such as '' | ||
+ | |||
+ | Numerical code is no different from other code: many things can go wrong. | ||
+ | Sometimes a program simply does not compile or run, but at other times it might | ||
+ | not be clear whether there is a //bug// or whether there is a failure for | ||
+ | another reason. Code often depends on (particular versions of) libraries, which | ||
+ | is a source of compilation errors; understanding how code is compiled will help | ||
+ | in figuring out what is required. Another example are the | ||
+ | // | ||
+ | files should be compiled. Another common build system is '' | ||
+ | with numerical software: they might not work on your machine, in which case you | ||
+ | need to know how to modify them. As most programs contain bugs, basic debugging | ||
+ | skills are very valuable. The larger a project grows, the more important these | ||
+ | skills become. | ||
+ | |||
+ | Being familiar with a version control system such as git has various benefits: | ||
+ | you can keep tracks of your changes, get the latest version of a code, | ||
+ | collaborate with others etcetera. Perhaps even more important is being able to | ||
+ | visualize your results. There exist many tools for this, examples of popular | ||
+ | open source packages are gnuplot, [[https:// | ||
+ | [[https:// | ||