no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+==== Becoming a computational scientist ====
+//Note: the text below is copied from Appendix A of my [[publications:thesis|PhD
+thesis]] from 2015.//
+I have now been working on computational science and computational physics
+problems for more than six years. What has surprised me, in hindsight, is the
+great number of things that one has to be familiar with in order to be
+productive. The reason is probably the relatively large amount of DIY (do it
+yourself) in computing, compared to other disciplines. Below I will try to
+summarize what skills I have found to be generally important.
+=== Selecting problems ===
+For doing research, the most important skill is perhaps the ability to pick the
+right problems.
+This is a rather difficult skill to master, and I am confident that I have not
+done so yet.
+Still, there are a couple of simple questions that I find useful for selecting
+problems:
+  * Are you interested in the problem?
+  * Are others interested in the problem?
+  * Do you expect to learn something useful when studying the problem?
+  * Does the problem seem feasible to you? If this is not clear, how much time do you approximately have to invest to answer this?
+  * Suppose that everything works out: you solve the problem. What would that mean to you? And what could you do next?
+  * How long do you give yourself? And suppose that you are unable to solve the full problem, is there then an intermediate result that could be of value?
+  * How hard will it be to write about the results? For example, for certain types of results a carefully written introduction, motivation, discussion or analysis might be required.
+Another question that becomes more relevant towards the end of a PhD is whether
+it is possible to obtain (future) grants or funding for a topic.
+===Theoretical skills===
+Below, I briefly discuss some of the theoretical topics that I believe to be
+important for a computational scientist.
+The most important topic is missing however, namely knowledge of the domain that
+you are working in.
+Such knowledge will help in selecting the right problems and in making the right
+approximations.
+====Applied mathematics====
+These are some of the topics in applied mathematics that I think are important
+for a computational scientist:
+  * Linear algebra: many problems can be written as a system of linear equations.
+  * Calculus: ordinary differential equations and Taylor series are important for many numerical methods. Also helps for knowing what can be calculated analytically or for being able to construct reference solutions.
+  * Statistics: Monte Carlo methods are quite common; to work with them, at least a basic understanding of statistics is required. The same goes for problems that are probabilistic in nature or contain data with noise.
+====Computer science====
+When we want to solve a problem on a computer, we have to select the appropriate
+//algorithm//. Algorithms can be classified by their `difficulty' or
+computational cost, which is the main topic of //computational complexity
+theory//. Knowing and understanding the computational cost of algorithms is not
+only important for efficiently solving a problem, but also for predicting what
+problems are feasible. For example, if you recognize that you are trying to
+solve an [[https://en.wikipedia.org/wiki/NP-hard|NP-hard]] problem, then you
+immediately know that you are limited to small problem sizes. With parallel
+computing, it is usually possible to go to larger problem sizes. To what extent
+this is the case depends on how well the algorithmic components can be
+parallelized, i.e., on the amount of local computation versus global
+communication.
+The practical cost of algorithms also has to do with the device that performs
+the algorithmic steps or computations. Modern processors operate in a rather
+complicated way, but knowledge of the cost of typical operations is important
+when you have to develop an efficient numerical method. The hardware in a
+processor also determines what integer and floating point numbers you can use.
+Understanding floating point arithmetic and its subtleties can save you a lot of
+time debugging `weird' behavior.
+====Computational science====
+Although there are many types of computations, most of them can be categorized
+into just a few categories:
+  * Solving linear systems of equations, i.e., solve $A x = b$ for a given matrix $A$ and vector $b$. Surprisingly many problems can be transformed into such a linear system.
+  * Optimization, for example: find the shortest path between $N$ cities, find the ground state energy of a quantum system or find the minimum of a function.
+  * Ordinary and partial differential equations. Many (physical) systems can be described by such equations. Different types of partial differential equations require quite different solution strategies.
+A computational scientist should probably be familiar with the basic methods for
+solving problems from these categories, so that one is able to find and select
+the best methods when the need arises.
+To prevent reinventing the wheel, some knowledge of the available libraries and
+codes is valuable.
+===Practical skills===
+The best strategy for solving a problem depends on what tools are already
+available. If sufficiently many other people have worked on a (similar) problem,
+software might be available that you can directly use. Take for example CFD
+(computational fluid dynamics), for which there are many different simulation
+tools. Selecting the right one then becomes one of the most important aspects of
+solving your problem.
+The other extreme would be that no existing software exists for your problem, so
+that you have to develop everything yourself. There are of course also many
+cases in between, for example when existing tools have to be modified to suit
+your needs. This means that it is often necessary to write computer code. Below,
+some of the practical aspects of writing your own code and reusing others' code
+are discussed.
+====Computer basics====
+For computing, the ''*nix'' operating systems appear to be most popular.
+Being familiar with a variant of e.g., GNU/Linux, BSD or OS X is therefore quite
+helpful -- this allows you to quickly use the code and tools that others have
+written.
+Good command of a text editor such as ''vim'' or ''emacs'', or a suitable IDE
+(integrated development environment) will speed up your code and text editing.
+This might also reduce the risk of developing RSI (repetitive strain injury),
+because most editors can be operated without a mouse ((In my experience, the
+combination of stress and mouse usage is most likely to cause physical
+discomfort.)). There are many useful tools included in a ''*nix'' system, but
+''ssh'' gets a special mention, because it allows you to work on remote systems.
+There exist a number of software suites for doing numerical or symbolic
+computations. Commercial packages are for example Matlab and Mathematica,
+whereas [[https://www.gnu.org/software/octave/|Octave]] or
+[[http://www.sagemath.org/|SageMath]] are examples of free software
+alternatives. The many built-in functions can help you to quickly develop a
+computational method. Even if you eventually have to implement this solver in a
+different environment, it can be helpful to start from a simple
+proof-of-concept. The generality of such suites is also their drawback:
+typically they will not be as efficient as a special purpose solution.
+====Programming====
+When you develop a method from scratch, you can use your preferred programming
+language -- this is of course not possible when you have to modify an existing
+method. The traditional languages for computing are C and Fortran. Especially C
+is quite `low-level', so that experience with C will be useful for understanding
+how a computer and other languages work. Fortran was specifically designed for
+numerical computing, which can make code development more convenient. Another
+popular //compiled// language is C++, which allows for many programming
+styles. This flexibility can be good for the expert but is sometimes hard for
+the beginner. Performance wise, there are no major differences between these
+languages as long as you know what you are doing.
+For certain tasks, scripting or interpreted languages such as Python can be more
+convenient.
+Such languages can for example be used to glue together other programs, to
+process data or to visualize results.
+Python can also be used for computations, although the numerical work is then
+typically performed by routines written in C or Fortran, which are made
+available by Python modules such as ''numpy''.
+Numerical code is no different from other code: many things can go wrong.
+Sometimes a program simply does not compile or run, but at other times it might
+not be clear whether there is a //bug// or whether there is a failure for
+another reason. Code often depends on (particular versions of) libraries, which
+is a source of compilation errors; understanding how code is compiled will help
+in figuring out what is required. Another example are the
+//Makefiles//((Makefiles contain rules that describe how a collection of source
+files should be compiled. Another common build system is ''CMake''.)) included
+with numerical software: they might not work on your machine, in which case you
+need to know how to modify them. As most programs contain bugs, basic debugging
+skills are very valuable. The larger a project grows, the more important these
+skills become.
+Being familiar with a version control system such as git has various benefits:
+you can keep tracks of your changes, get the latest version of a code,
+collaborate with others etcetera. Perhaps even more important is being able to
+visualize your results. There exist many tools for this, examples of popular
+open source packages are gnuplot, [[https://visit.llnl.gov/|Visit]] and
+[[https://www.paraview.org/|Paraview]].

Homepage

Differences

Page Tools