EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊

dry.jpg

DRY Programming

The idea of DRY is to reduce the repetition of code.

DRY vs. WET

If DRY means “Don’t Repeat Yourself”… then WET means “Write Every Time”, or “We Enjoy Typing”

Don’t write WET code!

How to DRY out your code

We write DRY code - or we DRY out WET code - through a combination of abstraction and normalization.

Abstraction

The “principle of abstraction” aims to reduce duplication of information (usually code) in a program whenever it is practical to do so:

“Each significant piece of functionality in a program should be implemented in just one place in the source code. Where similar functions are carried out by distinct pieces of code, it is generally beneficial to combine them into one by abstracting out the varying parts.”

Benjamin C. Pierce - Types and Programming Languages

Abstraction Example

The easiest way to understand abstraction is to see it in action. Here’s an example that you are already familiar with; determining the energy emitted by an object as a function of its temperature:

\(Q = \epsilon \sigma T^4\)

where \(\epsilon\) is an object’s emmissivity, \(\sigma\) is the Stefan-Boltzmann constant, and \(T\) is temperature in degrees Kelvin.

Abstraction Example

We might write the following code to determine \(Q\):

. . .

Abstraction Example

But this code is going to get very WET very fast.

. . .

Abstraction Example

Here’s a DRY version obtained using abstraction:

. . .

Abstraction Summary, Part 1

We keep our code DRY by using abstraction. In addition to functions, python also provides Classes as another important way to create abstractions.
Functions and Classes are the subject of this tomorrow’s exercise.

Abstraction Summary, Part 2

In general, the process of keeping code DRY through successive layers of abstraction is known as re-factoring.
The “Rule of Three” states that you should probably consider refactoring (i.e. adding abstraction) whenever you find your code doing the same thing three times or more.

Normalization

Normalization is the process of structuring data in order to reduce redundancy and improve integrity.

Normalization

Some of the key principles of Normalization include:

All data have a Primary Key, which uniquely identifies a record. Usually, in python, this key is called an Index.
Atomic columns, meaning entries contain a single value. This means no collections should appear as elements within a data table. (i.e. “cells” in structured data should not contain lists!)
No transitive dependencies. This means that there should not be implicit associations between columns within data tables.

Primary Keys

This form of normalization is easy to obtain, as the idea of an Index is embedded in almost any Python data structure, and a core component of data structures witin pandas, which is the most popular data science library in python (coming next week!).

Primary Keys

. . .

Atomic Columns

The idea of atomic columns is that each element in a data structure should contain a unique value. This requirement is harder to obtain and you will sometimes violate it.

. . .

Atomic Columns

The idea of atomic columns is that each element in a data structure should contain a unique value. This requirement is harder to obtain and you will sometimes violate it.

. . .

Transitive Dependencies

The idea of transitive dependencies is the inclusion of multiple associated attributes within the same data structure.

Transitive dependencies make updating data very difficult, but they can be helpful in analyzying data.
So we should only introduce them in data that we will not be editing.

Usually environmental data, and especially timeseries, are rarely modified after creation. So we don’t need to worry as much about these dependencies.

For example, contrast a data record of “temperatures through time” to a data record of “user contacts in a social network”.

Transitive Dependencies

The idea of transitive dependencies is the inclusion of multiple associated attributes within the same data structure.

. . .

Normalization Summary

In general, for data analysis, basic normalization is handled for you.

For read only data with fixed associations, a lack of normalization is manageable.
However, many analyses are easier if you structure your data in ways that are as normalized as possible.
If you are collecting data then it is important to develop an organization structure that is normalized.

EDS 217, Lecture 4: DRY 🏜 vs. WET 🌊

DRY Programming

DRY vs. WET

How to DRY out your code

Abstraction

Abstraction Example

Abstraction Example

Abstraction Example

Abstraction Example

Abstraction Summary, Part 1

Abstraction Summary, Part 2

Normalization

Normalization

Primary Keys

Primary Keys

Atomic Columns

Atomic Columns

Transitive Dependencies

Transitive Dependencies

Normalization Summary

The End