Refactoring Python
Refactoring Python#
What is refactoring?#
Repeatedly reorganising and rewriting code until it’s obvious to a new reader.
When to refactor?#
Before adding complexity It is hard to test When you are repeating yourself Brittleness, changing one part breaks others Make something easier to read
Great programmers spend more time refactoring, after it is functional to make it obvious
How to refactor?#
- Identify bad code
- Improve it
- Run tests
- Fix and improve tests
- Repeat
In practise#
- Rename, split and move
- Simplify
- Redraw boundaries - modules and classes relating to each other
Prerequisites to Refactoring#
- You need (thorough) tests
- Quick tests (Max 1 or 2 mins)
- Need source control
- Willing to make mistakes
Strategies#
Extract variable and extract function#
There are hidden variables that make code seem more complicated than it is, you should extract these variables.
Bad Code: MONTHS = (‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘June’, ‘July’, ‘August’, ‘September’, ‘October’, ‘November’, ‘December’)
def what_to_eat(month):
if (month.lower().endswith('r') or month.lower().endswith('ary')):
print('{}: oysters'.format(month))
elif 8 > MONTHS.index(month) > 4:
print('{}; tomatoes'.format(month))
else:
print('{} asparagus'.format(month))
if __name__ == '__main__':
what_to_eat('January')
what_to_eat('July')
what_to_eat('May')
After extract variable: MONTHS = (‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘June’, ‘July’, ‘August’, ‘September’, ‘October’, ‘November’, ‘December’)
def what_to_eat(month):
month_lower = month.lower()
ends_in_r = month_lower.endswith('r')
ends_in_ary = month_lower.endswith('ary')
index = MONTHS.index(month)
summer = 8 > index > 4
if ends_in_r or ends_in_ary:
print('{}: oysters'.format(month))
elif summer:
print('{}; tomatoes'.format(month))
else:
print('{} asparagus'.format(month))
if __name__ == '__main__':
what_to_eat('January')
what_to_eat('July')
what_to_eat('May')
Making it more clear at each level and with python there is no performance hit so always favour clarity.
We can take it a step further and extract a function. So we need a (boolean) function determining when tomatoes are good and when oysters are good. def what_to_eat(month): if oysters_good(month): print(‘{}: oysters’.format(month)) elif tomatoes_good(month): print(‘{}: tomatoes’.format(month)) else: print(‘{}: asparagus’.format(month))
def oysters_good(month):
month_lower = month.lower()
return (
month_lower.endswith('r') or
month_lower.endswith('ary')
)
def tomatoes_good(month):
index = MONTHS.index(month)
return 8 > index > 4
This makes it much easier to read.
What if calling the method inline causes a performance problem? You can then extract the function call into a variable. def what_to_eat(month): time_for_oysters = oysters_good(month) time_for_tomatoes = tomatoes_good(month)
if time_for_oysters:
print('{}: oysters'.format(month))
elif time_for_tomatoes:
print('{}: tomatoes'.format(month))
else:
print('{}: asparagus'.format(month))
So now we have a cached value so we don’t have to do computation multiple times and clarity.
Often conditions become more and more complex and dates and months may change based on certan criteria in which case it may be best to extract variables into classes
Extract (boolean) variables into classes#
class TomatoesGood:
def __init__(self, month):
self.index = MONTH.index(month)
self._result = 8 > index > 4
def __bool__(self):
return self._result
def what_to_eat(month):
time_for_oysters = OystersGood(month)
time_for_tomatoes = TomatoesGood(month)
...
Remember python calls __bool__()
when it is being evaluated in an if
Testing Easier#
Testing is made much easier (maximises testability):
test = OystersGood('November')
self.assertTrue(test.r)
self.assertTrue(test.ary)
self.assertTrue(test)
Extract Class and Move Fields#
A complicated class with a bunch of fields and concerns bunched together then it may be time to refactor:
class Pet:
def __init__(self, name, age, *, has_scales=False, lays_eggs=False, drinks_milk=False):
self.name = name
self.age = age
self.treats_eaten = 0
self.has_scales = has_scales
self.lays_eggs = lays_eggs
self.drinks_milk = drinks_milk
this class has characteristics of a pet and characteristics of any animal. So let us talk abput boundaries:
How to redraw boundaries#
- Add an improved interface
- Maintain backwards compatibility
- Issue warnings for old usage
- Migrate old usage to new usage
- Run tests to verify correctness
- Fix and improve broken tests
- Remove code for old interface
Warnings#
Built into pythons
import warnings
warnings.warn('Helpful message')
- Defaults printing to
stderr
- You can force warnings to become exceptions:
python -W error code.py
- a tool to find old ways of doing things
Extracting Class#
Extract an Animal
class from Pet
class Animal:
def __init__(self, *, has_scales=False, lays_eggs=false, drinks_milk=False):
self.has_scales = has_scales
self.lays_eggs = lays_eggs
self.drinks_milk = drinks_milk
Have Pet take an Animal
instance and handle usage:
class Pet:
def __init__(self, name, age, animal=None, **kwargs):
if kwargs and animal is not None:
raise TypeError('Mixed Usage')
if animal is None:
warnings.warn('Should use Animal')
animal = Animal(**kwargs)
self.animal = animal
self.name = name
self.age = age
self.treats_eaten = 0
But viewing properties on Pet that are Animal
should still work, so we use @property
class Pet:
...
@property
def has_scales(self):
warning.warn('Use animal attribute')
return self.animal.has_scales
...
- Use optional arguments when splitting a class
- Use
@property
to move fields between classes and inner classes - Issue warnings for old usage
Always ask yourself when refactoring: Is this obvious?
You can use @property.setter
to set the property of a inner object
@has_scales.setter
def has_scales(self, has_scales):
warning.warn('Assign animal attribute')
self.animal.has_scales = has_scales
Example Refactoring#
An example class that we can refactor
import math
import json
class DataStats:
def stats(self, data, iage, isalary):
# iage and isalary are the starting age and salary used to
# compute the average yearly increase of salary.
# Compute average yearly increase
average_age_increase = math.floor(
sum([e['age'] for e in data])/len(data)) - iage
average_salary_increase = math.floor(
sum([int(e['salary'][1:]) for e in data])/len(data)) - isalary
yearly_avg_increase = math.floor(
average_salary_increase/average_age_increase)
# Compute max salary
salaries = [int(e['salary'][1:]) for e in data]
threshold = '£' + str(max(salaries))
max_salary = [e for e in data if e['salary'] == threshold]
# Compute min salary
salaries = [int(d['salary'][1:]) for d in data]
min_salary = [e for e in data if e['salary'] ==
'£{}'.format(str(min(salaries)))]
return json.dumps({
'avg_age': math.floor(sum([e['age'] for e in data])/len(data)),
'avg_salary': math.floor(sum(
[int(e['salary'][1:]) for e in data])/len(data)),
'avg_yearly_increase': yearly_avg_increase,
'max_salary': max_salary,
'min_salary': min_salary
})
So what are the issues with the code above
- The is no
__init()__
constructor so there are no class variables and might as well just be a function - The
stats()
method is complicated and not obvious - There is duplicate code
We want to maintain the code, as it works but there are no tests for it. So we need to do TDR
Test driven refactoring. Once you have a test in place you can confidently modify the code and know if your refactoring has changed the results.