import numpy as np ## It's good practice to put your imports right at the top!

7. Functions#

Follow along!

Remember that to best make use of this tutorial, it is highly recommended that you make your own notebook and type every piece of code yourself!

This is the last piece of basic programming structure that you will need before you can completely start writing novel and useful Python codes for yourself.

To this point, we have been writing short procedures that work on one or two specific variables, but with some of our more repetitious examples, this involved writing the same code over and over. If we introduced a new variable, we might have to retype that code again with the new variable name, which would make our code unnecessarily long and harder to read. To get around this, Python makes it very easy to write our own custom functions.

You have been using many Python functions already; anytime you used the syntax command() (some characters followed by parentheses), you have been using a function. In this section, you will learn how to create your own functions, which will work similarly, but do the processes that you want. Over time, you can build up libraries of your own useful functions so that repeating analyses is as simple as running one line of code.

7.1. What is a function?#

Similar to the “function” you may known from math class, a function in Python is best understood as a name for an operation or procedure that you apply to an input to receive a desired output. Functions require a bit of abstract thought in this way, so it is helpful to stay grounded by thinking about what you are putting into the function and what you would like to get out. This is more concretely illustrated by looking at the generic syntax for a Python function.

def myfunction(arg1, arg2): # 'arg1' and 'arg2' are the INPUTS to the function
    # Do some stuff
    sum_of_args = arg1 + arg2 # As an example, let's add our arguments
    return sum_of_args # This is our OUTPUT

a = 1
b = 2
print(myfunction(a, b)) # I call the new function using 'myfunction'

From this example, you should take note of the following:

To define a function, we use the operator def, followed by the function’s name, then parentheses containing the names of the inputs. We finish this first line with a colon and the “body” of the function, where we put our instructions, is denoted by indenting the instructions.
The names arg1 and arg2 are temporary variable names that exist only inside the function definition. You can use any variable names that you want, although I recommend that you use new (never used in your code or notebook) and descriptive names in these definitions.
The operator return tells the function what it should “spit out.” Python functions do not have to have an output - several of the following examples do not - but if you do want output, you must use return.
In the above example, myfunction takes in a and b and, following the rules of its definition, adds their values together into a new variable sumargs. The function returns the value of this new variable.

7.2. Simple functions#

Here are a few examples of simple functions that would have been useful in previous sections.

7.2.1. Example: conservation of print effort#

Consider the following function called greeting. This function takes in a string, which is called name in the instructions of the function, and uses it to format a greeting.

def greeting(name):
    fixed_name = name.title()  ## What does the `title` method do to a string?
    print(f"Hello, {fixed_name}, it's so nice to see you!")

We can then use this function to quickly greet all of our friends using a loop!

names = ['Alex', 'betty', 'jean-luc', 'tanya']
for name in names:
    greeting(name) ## I call my function 'greeting' here

Hello, Alex, it's so nice to see you!
Hello, Betty, it's so nice to see you!
Hello, Jean-Luc, it's so nice to see you!
Hello, Tanya, it's so nice to see you!

Notice how this function didn’t use the return operation, but it still results in some output. This is because when the function is called (executed), all the instructions inside the function are executed as if they had been pasted in. So when we use a print inside a function, it will print things to the screen when the function is called.

7.2.2. Example: formatted addition#

We already know how to do addition with +, but what if we want to illustrate this process as well? Consider another function called nice_addition that I’ve defined below, that nicely formats the addition of two numbers.

def nice_add(x, y):
    x_plus_y = x + y
    print("{} + {} = {}".format(x, y, x_plus_y))
    return x_plus_y

Here we do make use of the return operator, so the result of the computation (internally called x_plus_y in the function definition) will be “returned” to the user. This means that the function can be used to assign a value to a new variable, as seen below. This allows us to make use of the information generated in the function.

sum1 = nice_add(1, 2)
print(sum1)

sum2 = nice_add(12.5, -32.4)
print(sum2)

1 + 2 = 3
3
12.5 + -32.4 = -19.9
-19.9

Note however, that the variable names used in the function definition (x, y, and x_plus_y) are not actually generated as variables with stored information. That is, calling the function nice_add does not make a variable x_plus_y available to the Jupyter notebook. You can see this in the cell below, where trying to print out x_plus_y generates a NameError. (This somewhat subtle note has to do with how Python keeps track of its namespace, which can be thought of as the list of functions and variables that Python currently knows about.)

print(x_plus_y)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 print(x_plus_y)

NameError: name 'x_plus_y' is not defined

7.2.3. Exercises#

Modify the example function nice_add to make a new function called nicer_add, where nicer_add will check that the second input is positive before using a plus sign in the printing, otherwise using a negative sign.
Write a function called add_to_zoo that takes in a string containing an animal name and a list of animals in the zoo. Check to see if the new animal is in the zoo (the animal is in the list), and if it is, politely decline the request to add the animal to the zoo. If the animal is not in the zoo, add it to the zoo (print a statement that you have done so!).

7.3. Function inputs (arguments)#

At this point, most of the functions that we have introduced only require one or two inputs, but Python functions can be much more versatile. Specifically, Python functions can be defined to have infinite numbers of either positional or keyword arguments.

7.3.1. Positional arguments#

The functions we’ve defined here and that you’ve used in previous sections have all used positional arguments, which are named after the fact that their position in the parentheses is how Python assigns the temporary variables used in the function assignment. For example, when we call nice_add(1, 2), Python knows to assign the temporary variables x=1 and y=2 because 1 is the first input and 2 is the second. If we create a function with 10 positional arguments, then when we call the function, we need to provide these inputs in the correct order to maintain the correct variable assignment within the function. This also means that if we define a function with \(N\) positional arguments, then \(N\) inputs must be provided whenever calling this function to avoid a TypeError. Try calling nice_add(1) to see what this looks like.

7.3.2. Keyword arguments#

While requiring certain inputs is often useful and necessary when we create functions, sometimes we want to give our function some flexibility with options and defaults. These can be set in the function definition using the = (variable assignment) syntax. For example, we can modify our greeting to greet “everyone” if no name is supplied.

def greeting(name='everyone'):
    fixed_name = name.title()  ## What does the `title` method do to a string?
    print(f"Hello, {fixed_name}, it's so nice to see you!")

Then, when calling this function, if we don’t specify a name, greeting already has a value and can run without error.

greeting()  ## The empty parentheses indicate that we're calling the function without inputs.

Hello, Everyone, it's so nice to see you!

7.3.3. Mixing positional and keyword arguments#

We can also mix the two types of arguments in the function definition, with the only caveat being that all positional arguments must be specified before any keyword arguments.

In the example below, we greet our friends and their guests and thank them for bringing food. However, I didn’t tell everyone to bring food, so by default we don’t expect it and we set food using a keyword argument. The default value is None, which is a special Python datum that corresponds to nothing. Using None is useful for setting default variables, because we can easily catch it using an if var is not None: clause as in the example.

def greeting_with_food(name, number_of_friends, food=None):
    
    fixed_name = name.title()
    output = f"\nHello, {fixed_name}, it's so nice to see you!"
    
    ## We only want to greet integer numbers of friends.
    if isinstance(number_of_friends, int):
        output = output + f" I see that you've brought {number_of_friends} friends, that's great!"
    
    print(output)
    
    if food is not None:
        print(f"Oh, you brought {food}!? Excellent!")

Calling this function a few times, we can see that Amy brought 3 friends and some cookies, Stefon tried to bring 1.2(?) friends, but trying to greet Quentin and his lasagna causes a problem.

greeting_with_food("Amy", 3, food="cookies")
greeting_with_food("Stefon", 1.2)
greeting_with_food("Quentin", food="lasagna") ## What's the error message say?

Hello, Amy, it's so nice to see you! I see that you've brought 3 friends, that's great!
Oh, you brought cookies!? Excellent!

Hello, Stefon, it's so nice to see you!

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[11], line 3
      1 greeting_with_food("Amy", 3, food="cookies")
      2 greeting_with_food("Stefon", 1.2)
----> 3 greeting_with_food("Quentin", food="lasagna") ## What's the error message say?

TypeError: greeting_with_food() missing 1 required positional argument: 'number_of_friends'

7.3.4. Assigning inputs by name#

It’s worth noting that while positional arguments must precede keyword arugments in function definitions and when calling functions as shown above. We can mix the order of inputs when calling functions if we know the names of the positional arguments used in defining the function. This is easily seen with some examples:

greeting_with_food(name="Amy", number_of_friends=3, food="cookies")

Hello, Amy, it's so nice to see you! I see that you've brought 3 friends, that's great!
Oh, you brought cookies!? Excellent!

Here we know the positional arguments’ names, so we can clarify our code by using these (hopefully informative) names when calling the function. This lets us change the order of the inputs, because Python won’t be confused about which input is which:

greeting_with_food(food="cookies", number_of_friends=3, name="Amy")

Hello, Amy, it's so nice to see you! I see that you've brought 3 friends, that's great!
Oh, you brought cookies!? Excellent!

However, if we want to ignore these names, then we have to supply the unnamed arguments in order. For example, the following will not work:

greeting_with_food(name="Amy", 3, food="cookies")

  Cell In[14], line 1
    greeting_with_food(name="Amy", 3, food="cookies")
                                                    ^
SyntaxError: positional argument follows keyword argument

7.4. Documenting functions#

At this point, we now have all the tools to completely understand the documentation of different functions such as np.sum, which is introduced as:

numpy.sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)

Where we can see that a is the only positional argument, but there are also the keyword arguments axis, dtype, out, keepdims, initial, and where.

7.4.1. Docstrings#

Beyond this however, is usually a breakdown of all the details of the function, such as descriptions of what the inputs and outputs are and examples of how the function should be used. When writing your own functions, it is good practice to provide this documentation (at least in part) to remind yourself and others what your function does and how it should be used. This is naturally done in Python with a docstring (“documentation string”), which is a special message that you can include after the first line of your function definition (after the def myfunc(inputs): line) to provide more information on your function. Providing this docstring means that whenever you use the help function on your function it will print this docstring to the screen. For example, using help(np.sum) provides the same information as we can find at the link included earlier.

help(np.sum)

Help on function sum in module numpy:

sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Sum of array elements over a given axis.
    
    Parameters
    ----------
    a : array_like
        Elements to sum.
    axis : None or int or tuple of ints, optional
        Axis or axes along which a sum is performed.  The default,
        axis=None, will sum all of the elements of the input array.  If
        axis is negative it counts from the last to the first axis.
    
        .. versionadded:: 1.7.0
    
        If axis is a tuple of ints, a sum is performed on all of the axes
        specified in the tuple instead of a single axis or all the axes as
        before.
    dtype : dtype, optional
        The type of the returned array and of the accumulator in which the
        elements are summed.  The dtype of `a` is used by default unless `a`
        has an integer dtype of less precision than the default platform
        integer.  In that case, if `a` is signed then the platform integer
        is used while if `a` is unsigned then an unsigned integer of the
        same precision as the platform integer is used.
    out : ndarray, optional
        Alternative output array in which to place the result. It must have
        the same shape as the expected output, but the type of the output
        values will be cast if necessary.
    keepdims : bool, optional
        If this is set to True, the axes which are reduced are left
        in the result as dimensions with size one. With this option,
        the result will broadcast correctly against the input array.
    
        If the default value is passed, then `keepdims` will not be
        passed through to the `sum` method of sub-classes of
        `ndarray`, however any non-default value will be.  If the
        sub-class' method does not implement `keepdims` any
        exceptions will be raised.
    initial : scalar, optional
        Starting value for the sum. See `~numpy.ufunc.reduce` for details.
    
        .. versionadded:: 1.15.0
    
    where : array_like of bool, optional
        Elements to include in the sum. See `~numpy.ufunc.reduce` for details.
    
        .. versionadded:: 1.17.0
    
    Returns
    -------
    sum_along_axis : ndarray
        An array with the same shape as `a`, with the specified
        axis removed.   If `a` is a 0-d array, or if `axis` is None, a scalar
        is returned.  If an output array is specified, a reference to
        `out` is returned.
    
    See Also
    --------
    ndarray.sum : Equivalent method.
    
    add.reduce : Equivalent functionality of `add`.
    
    cumsum : Cumulative sum of array elements.
    
    trapz : Integration of array values using the composite trapezoidal rule.
    
    mean, average
    
    Notes
    -----
    Arithmetic is modular when using integer types, and no error is
    raised on overflow.
    
    The sum of an empty array is the neutral element 0:
    
    >>> np.sum([])
    0.0
    
    For floating point numbers the numerical precision of sum (and
    ``np.add.reduce``) is in general limited by directly adding each number
    individually to the result causing rounding errors in every step.
    However, often numpy will use a  numerically better approach (partial
    pairwise summation) leading to improved precision in many use-cases.
    This improved precision is always provided when no ``axis`` is given.
    When ``axis`` is given, it will depend on which axis is summed.
    Technically, to provide the best speed possible, the improved precision
    is only used when the summation is along the fast axis in memory.
    Note that the exact precision may vary depending on other parameters.
    In contrast to NumPy, Python's ``math.fsum`` function uses a slower but
    more precise approach to summation.
    Especially when summing a large number of lower precision floating point
    numbers, such as ``float32``, numerical errors can become significant.
    In such cases it can be advisable to use `dtype="float64"` to use a higher
    precision for the output.
    
    Examples
    --------
    >>> np.sum([0.5, 1.5])
    2.0
    >>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32)
    1
    >>> np.sum([[0, 1], [0, 5]])
    6
    >>> np.sum([[0, 1], [0, 5]], axis=0)
    array([0, 6])
    >>> np.sum([[0, 1], [0, 5]], axis=1)
    array([1, 5])
    >>> np.sum([[0, 1], [np.nan, 5]], where=[False, True], axis=1)
    array([1., 5.])
    
    If the accumulator is too small, overflow occurs:
    
    >>> np.ones(128, dtype=np.int8).sum(dtype=np.int8)
    -128
    
    You can also start the sum with a value other than zero:
    
    >>> np.sum([10], initial=5)
    15

Docstrings are specified using triple-quotes """docstring""" and should either be brief one-line descriptions of the function, as seen in the following definition of greeting, or more comprehensive multi-line descriptions like the one I’ve added to greeting_with_food.

def greeting(name='everyone'):
    """This function will greet a friend nicely."""
    fixed_name = name.title()
    print(f"Hello, {fixed_name}, it's so nice to see you!")
    
def greeting_with_food(name, number_of_friends, food=None):
    """
    Greets a friend and their guests and thanks them for any food they've brought.
    
    Parameters
    ----------
    name : string
        Name of the friend to greet.
    number_of_friends : int
        Number of friends that your friend has brought with them. If not an integer, 
        this input is ignored.
    food : string, optional
        The food that your friend has brought to your party. By default, we don't
        expect anyone to have brought food, so we don't say anything.
    
    Notes
    -----
    This function doesn't return anything, but only prints a message to the screen.
    
    Examples
    --------
    
    >>> greeting_with_food("Amy", 3, food="cookies")
    Hello, Amy, it's so nice to see you! I see that you've brought 3 friends, that's great!
    Oh, you brought cookies!? Excellent!
    """
    
    fixed_name = name.title()
    output = f"\nHello, {fixed_name}, it's so nice to see you!"
    
    ## We only want to greet integer numbers of friends.
    if isinstance(number_of_friends, int):
        output = output + f" I see that you've brought {number_of_friends} friends, that's great!"
    
    print(output)
    
    if food is not None:
        print(f"Oh, you brought {food}!? Excellent!")

You can now try using help on these functions to learn about how to use them.

It’s worth noting that docstrings commonly provide:

Details on the inputs, indicating the expected type of the inputs and what their default values are if they have them
Details on the outputs of the function, including what type the outputs are
Related functions, especially in the context of a module of functions that you may be building
Notes on using the function, especially if there are non-obvious details to your implementation,
Examples on how to use the function, especially if there are keyword arguments that are important to the function’s operations

Note

At a minimum, it is very good practice to include a sentence or two about what your function does and what your goals for writing it were. This will help you organize your own work and makes your code much more readable to others.

7.5. Exercises#

In a fashion similar to nicer_add, write a function nice_arithmetic that formats basic arithmetic operations (+, -, *, /) nicely. Use a keyword to allow the user to specify the operator they would like to use, and use another keyword to allow the user to toggle between printing symbols and plain english (i.e. between “1 + 2 = 3” and “The sum of 1 and 2 is 3”). Return the computation as output, and test your function on several examples.
Use the following pseudocode to define a function integrate that performs simple integration of a function over a range (recall that integration is simply finding the area under a curve).
- Set as inputs: the function to integrate, f, the endpoints of the interval, a, and b, as well as the number of steps to use in approximation n (you may use a default number here).
- Check that a < b and that n is a positive integer. If either of these conditions are not met, print what is wrong and use return with no output to quit the function.
- Calculate dx = (b - a)/n, the width of each approximating rectangle.
- Set left_pt = a, right_pt = a + dx, sum = 0
- While right_pt is less than or equal to b do the following:
  - Find the average of f at left_pt and right_pt, multiply by dx to approximate the area of the rectangle under the function f between left_pt and right_pt.
  - Add the result of this calculation to sum.
  - Increment left_pt and right_pt by dx to examine the next rectangle.
- Print the value of the integral (show the given endpoints and the number of approximating rectangles)
- Return the output
Test your function using np.cos and a=0, b=1 for various values of n. (As n increases, your function should return a number closer to 0.841470984807897.) If you have any issues, try this wikipedia reference, then try this YouTube video if you are still having trouble. Note that the video is not exactly the same protocol as I have given here, so you will need to make appropriate adjustments.
Write docstrings for your functions in the previous two exercises.

7.6. Next Steps#

At this point, you have almost all the tools you need to start programming in Python. However, the next few sections of this tutorial will be essential if you want to use Python to do quantitative work with data. In particular, learning how to generate figures and how to read in data from files (the next two sections), will be immensely useful to anyone working with data. The last few sections are also useful enough that I cannot omit them from an introductory tutorial, although you can get by without learning about dictionaries, classes, and random number generators if you need to, but if you are taking my What Do Your Data Say? course, you will need to learn about random number generation in Python.

A Python Tutorial for Data Scientists

Functions

Contents

7. Functions#

7.1. What is a function?#

7.2. Simple functions#

7.2.1. Example: conservation of print effort#

7.2.2. Example: formatted addition#

7.2.3. Exercises#

7.3. Function inputs (arguments)#

7.3.1. Positional arguments#

7.3.2. Keyword arguments#

7.3.3. Mixing positional and keyword arguments#

7.3.4. Assigning inputs by name#

7.4. Documenting functions#

7.4.1. Docstrings#

7.5. Exercises#

7.6. Next Steps#