한창 알고리즘 문제들을 풀면서 제 코드를 다시 보면 너무 지저분하고 또 다른 분들의 숏코드를 보면 와.. 이걸 이렇게 할 수 있구나 싶었습니다. 그러다 DataCamp 의 Efficeint Code라는 수업을 들으며 배운 내용을 공유해보고자 합니다.

효율적인 코드란?

사실 효율적인 코드의 정의를 어떻게 내리느냐에 따라 앞으로의 내용이 많이 달라질 것 같습니다. 가장 중요한 두가지가 있습니다.

빠르고 최소 시간의 런타임(소요시간)
최소의 resource를 활용하는 것

파이썬에는 Pythonic한 코드를 작성하라는 팁이 있습니다.

import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Beautiful is better than ugly. 아름다움이 추한 것보다 낫다.

Explicit is better than implicit. 명확함이 함축된 것보다 낫다.

Simple is better than complex. 단순함이 복잡한 것보다 낫다.

Complex is better than complicated. 복잡함이 난해한 것보다 낫다.

Flat is better than nested. 단조로움이 중접된 것보다 낫다.

Sparse is better than dense. 여유로움이 밀집된 것보다 낫다.

Readability counts. 가독성은 중요하다.

Special cases aren't special enough to break the rules. 규칙을 깨야할 정도로 특별한 경우란 없다.
Although practicality beats purity. 비록 실용성이 이상을 능가한다 하더라도.

Errors should never pass silently. 오류는 결코 조용히 지나가지 않는다. Unless explicitly silenced. 알고도 침묵하지 않는 한.

In the face of ambiguity, refuse the temptation to guess. 모호함을 마주하고 추측하려는 유혹을 거절하라.
There should be one-- and preferably only one --obvious way to do it. 문제를 해결할 하나의 - 바람직하고 유일한 - 명백한 방법이 있을 것이다.
Although that way may not be obvious at first unless you're Dutch. 비록 당신이 우둔해서 처음에는 명백해 보이지 않을 수도 있겠지만. Now is better than never. 지금 하는 것이 전혀 안하는 것보다 낫다.
Although never is often better than right now. 비록 하지않는 것이 지금 하는 것보다 나을 때도 있지만.

If the implementation is hard to explain, it's a bad idea. 설명하기 어려운 구현이라면 좋은 아이디어가 아니다.
If the implementation is easy to explain, it may be a good idea. 쉽게 설명할 수 있는 구현이라면 좋은 아이디어일 수 있다.
Namespaces are one honking great idea -- let's do more of those! 네임스페이스는 정말 대단한 아이디어다. -- 자주 사용하자!

출처: https://wikidocs.net/7907

중요한 내용을 좀 추려보자면 중첩과 누락을 피하고 최대한 명확하고 읽기 쉬운 간단한 코드를 작성해야합니다.

Bulit-in 함수: print range len round enumerate map zip 등이 있습니다.
Bulit-in 모듈: os sys itertools collections 등이 있습니다.

이 과정에서 제가 배우면서 와닿았던 부분들이 있습니다.

for loop을 남발하지 않고 List Comprehension을 통해 간결하게 만드는 것
Numpy의 브로드 캐스팅을 활용하는 것

%timeit을 통해 코드의 런타임을 알아볼 수 있습니다.

-r method : run
-n method : loop
-o method : saving output to a variable

%lsmagic을 이용하면 모든 매직 커맨드를 확인해볼 수 있습니다.

import numpy as np
import pandas as pd
%timeit -r2 -n10 rand_nums = np.random.rand(1000)

11.1 µs ± 2.84 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)

times = %timeit -o rand_nums = np.random.rand(1000)

The slowest run took 10.20 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 9.48 µs per loop

print(times.all_runs)
print(times.best)
print(times.worst)

[0.9632805540001073, 0.9514958920000254, 0.9509307729999819, 0.9479201429999193, 1.0195112180000478]
9.479201429999193e-06
9.669099995335273e-05

이렇게 변수에 할당하여 내용을 확인할 수 있습니다.

재밌게 읽었던 내용중에 list() 와 [] 중에 뭐가 더 빠를까? 에 대한 내용입니다.
어떤 방법이 더 빠를까요? 이름을 명시하는 것보다 []와 같이 literal 구문을 이용하는 것이 더 빠릅니다. 한 번 해보세요!

Code Profiling : Time

좀 더 자세한 내용과 각 라인 별로 보고 싶을 수도 있잖아요? 그리고 %timeit을 사용하면 1 라인만 확인할 수 있어요. 저희는 def를 이용해 함수를 정의해서 많이 사용하기 때문에 라인별로 어디에서 시간이 오래걸리는지 알 필요가 있습니다. pip install line_profiler를 이용해 설치해주세요

!pip install line_profiler

Collecting line_profiler
  Downloading line_profiler-3.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (66 kB)
     |████████████████████████████████| 66 kB 2.4 MB/s 
Requirement already satisfied: IPython>=0.13 in /usr/local/lib/python3.7/dist-packages (from line_profiler) (5.5.0)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.7/dist-packages (from IPython>=0.13->line_profiler) (1.0.18)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.7/dist-packages (from IPython>=0.13->line_profiler) (5.1.1)
Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from IPython>=0.13->line_profiler) (4.4.2)
Requirement already satisfied: pexpect in /usr/local/lib/python3.7/dist-packages (from IPython>=0.13->line_profiler) (4.8.0)
Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from IPython>=0.13->line_profiler) (2.6.1)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from IPython>=0.13->line_profiler) (0.7.5)
Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.7/dist-packages (from IPython>=0.13->line_profiler) (0.8.1)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.7/dist-packages (from IPython>=0.13->line_profiler) (57.4.0)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->IPython>=0.13->line_profiler) (1.15.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->IPython>=0.13->line_profiler) (0.2.5)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect->IPython>=0.13->line_profiler) (0.7.0)
Installing collected packages: line-profiler
Successfully installed line-profiler-3.4.0

def convert_units(herose, hts, wts):

  new_hts =[ht * 0.39370 for ht in hts]
  new_wts =[wt * 2.20462 for wt in wts]

  hero_data = {}
  for i, hero in enumerate(herose):
    hero_data[hero] = (new_hts[i], new_wts[i])
  
  return hero_data

convert_units(heroes, hts, wts)

{'Batman': (74.01559999999999, 209.4389),
 'Superman': (75.19669999999999, 222.66661999999997),
 'Wonder Woman': (72.0471, 163.14188)}

%load_ext line_profiler
# 나 이제 라인 프로파일러 사용할거야 하고 알려줍니다

%lprun -f
# 함수를 볼거야 하고 알려줍니다.

%load_ext line_profiler
%lprun -f convert_units

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler

이렇게 보면 그냥 잘모르겠죠.. 값을 넣어줍니다 식은 %lprun -f 함수명 함수명(값) 입니다

%reload_ext line_profiler
%lprun -f convert_units convert_units(heroes, hts, wts)

이제 확실히 알수 있겠죠?
각 라인이 몇번 실행 됐는지
어느 라인에서 시간이 오래걸렸고(%)
그 라인이 어느 구문인지 알 수 있습니다.

Code profiling : Memory

시간을 확인해봤으니 이제 리소스를 얼마나 잡아먹고 있나 확인해보겠습니다.

pip install memory_profiler해주세요. 방법은 위와 동일합니다!

!pip install memory_profiler

Collecting memory_profiler
  Downloading memory_profiler-0.60.0.tar.gz (38 kB)
Requirement already satisfied: psutil in /usr/local/lib/python3.7/dist-packages (from memory_profiler) (5.4.8)
Building wheels for collected packages: memory-profiler
  Building wheel for memory-profiler (setup.py) ... done
  Created wheel for memory-profiler: filename=memory_profiler-0.60.0-py3-none-any.whl size=31285 sha256=8fca6dd0956c7d8db0c21bd2b1fd3a3cd5d968f8a7ffb1a2142c140ff0a2d516
  Stored in directory: /root/.cache/pip/wheels/67/2b/fb/326e30d638c538e69a5eb0aa47f4223d979f502bbdb403950f
Successfully built memory-profiler
Installing collected packages: memory-profiler
Successfully installed memory-profiler-0.60.0

%load_ext memory_profiler
%mprun -f convert_units convert_units(heroes, hts, wts)

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler
ERROR: Could not find file <ipython-input-15-c2659a0c0e8b>
NOTE: %mprun can only be used on functions defined in physical files, and not in the IPython environment.

%mprun can only be used on functions defined in physical files, and not in the IPython environment

아아... 제가 현재 코랩환경에서 하고 있어서 못보여드리네요..😥😥
파이참이나 Ipython 환경이 아니신 분들은 한 번 해보세요 MiB 기준으로 이전행과 얼만큼의 메모리를 사용하고 있는지 알 수 있습니다.

Combining, Counting, iterating

간단하게 작은 데이터셋을 이용하여 설명하도록 하겠습니다.
아래 1번 코드는 제가 그냥 코딩을 한 것입니다.
2번은 zip이라는 함수를 사용하여 더 간편하게 바꿀 수 있습니다.

names = ['피카츄', '파이리', '꼬북이']
hps = [45, 39, 44]

combined = []

for idx , pokemon in enumerate(names):
  combined.append((pokemon, hps[idx]))
print(combined)

[('피카츄', 45), ('파이리', 39), ('꼬북이', 44)]

combined_zip = zip(names, hps)
print(combined)

[('피카츄', 45), ('파이리', 39), ('꼬북이', 44)]

이외에도 다양한 함수들이 있습니다.
namedtuple : tuple subclasses with named fields
deque : list-like container with fast appends and pops
Counter : dict for counting hashable objects
OrderedDict : dict that retains order of entries
defaultdict : dict that calls a factory function to supply missing values deque : list-like container with fast appends and pops

모두 다 실습해 볼 수는 없으니 양해를 부탁드립니다 🤕

예를 들어, 각각의 아이템이 총 몇개씩 들어가 있는지 궁금할 때가 있습니다.
가끔 코딩 테스트를 준비하며 백준 온라인 저지에 있는 문제를 풀다 보면 나오는 문제기도 합니다.

from collections import Counter
print(Counter(types))

Counter({'Water': 66, 'Normal': 64, 'Bug': 51, 'Grass': 47, 'Psychic': 31, 'Rock': 29, 'Fire': 27, 'Electric': 25, 'Ground': 23, 'Fighting': 23, 'Poison': 22, 'Steel': 18, 'Ice': 16, 'Fairy': 16, 'Dragon': 16, 'Ghost': 13, 'Dark': 13})

collections에는 이 뿐만 아니라 아래의 함수들도 제공하고 있습니다.
저 같은 경우는 경우의 수 조합을 구할 때 combinations 함수를 자주 이용하곤 했습니다.

Innite iterators: count , cycle , repeat
Finite iterators: accumulate , chain , zip_ longest , etc.
Combination generators: product , permutations , combinations

또 중복제거에 탁월한 set 함수도 있습니다.
그런데 여기에 SQL에서 사용했던 JOIN과 비슷한 역할을 해주는 메소드가 있습니다.

| 해당 메소드는 Set type에서만 사용 가능합니다.

intersection() : all elements that are in both sets / 교집합
difference() : all elements in one set but not the other / A
symmetric_difference() : all elements in exactly one set/ A - AnB
union() : all elements that are in either set / 합집합

Loop 제거하기

for와 while는 정말 자주 사용하기도 하고 안쓸려고해도 안쓸수가 없습니다. 그래도 최대한 쓰지 않아도 될 때는 대체하는 것이 Pytonic한 코드라고 할 수 있습니다.
3가지 방법으로 비교해볼려고 합니다.

for loop
List Comprehension
Built-in map() funtion

%%timeit
totals = []
for row in poke_stats:
  totals.append(sum(row))

100000 loops, best of 5: 11.1 µs per loop

%timeit totals_comp = [sum(row) for row in poke_stats]

100000 loops, best of 5: 8.65 µs per loop

%timeit totals_map = [*map(sum, poke_stats)]

100000 loops, best of 5: 6.32 µs per loop

결과가 보이시나요?? 이렇게 시간을 단축 시킬 수 있습니다 ✈