阶段五:迭代生成
前言
在写乘法表的时候,你可能写过类似
for i in [2, 3, 5, 7, 11, 13]: print(i)
这样的语句。for in 语句理解起来很直观形象,比起 C++ 和 java 早期的
for (int i = 0; i < n; i ++)
这样的语句,不知道简洁清晰到哪里去了。
但是,你想过 Python 在处理 for in 语句的时候,具体发生了什么吗?什么样的对象可以被 for in 来枚举呢?
容器迭代
容器这个概念非常好理解。
在 Python 中一切皆对象,对象的抽象就是类,而对象的集合就是容器。
列表 list: [0, 1, 2]
,元组 tuple: (0, 1, 2)
,字典 dict: {0:0, 1:1, 2:2}
,集合 set: set([0, 1, 2])
都是容器。
对于容器,你可以很直观地想象成多个元素在一起的单元;而不同容器的区别,正是在于内部数据结构的实现方法。
然后,你就可以针对不同场景,选择不同时间和空间复杂度的容器。所有的容器都是可迭代的(iterable)。
iterator = iter(iterable)
while True:
elem = next(iterator)
# do something
2
3
4
- 首先,在可迭代对象上调用内置
iter
函数以创建对应的迭代器。 - 要获取序列中的下一个元素,在此迭代器上调用内置
next
函数。
如果没有下一个元素了,怎么办?
我们需要对这种异常进行处理。
思考题:什么是异常处理,为什么要进行异常处理?有什么好处?
多次调用 iter
可迭代对象每次都会返回一个具有不同状态的新迭代器
你也可以调用 iter
迭代器本身,它只会返回相同的迭代器而不改变它的状态。但是,请注意,您不能直接在可迭代对象上调用 next。
>>> lst = [1, 2, 3, 4]
>>> next(lst) # Calling next on an iterable
TypeError: 'list' object is not an iterator
>>> list_iter = iter(lst) # Creates an iterator for the list
>>> list_iter
<list_iterator object ...>
>>> next(list_iter) # Calling next on an iterator
1
>>> next(list_iter) # Calling next on the same iterator
2
>>> next(iter(list_iter)) # Calling iter on an iterator returns itself
3
>>> list_iter2 = iter(lst)
>>> next(list_iter2) # Second iterator has new state
1
>>> next(list_iter) # First iterator is unaffected by second iterator
4
>>> next(list_iter) # No elements left!
StopIteration
>>> lst # Original iterable is unaffected
[1, 2, 3, 4]
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
英语练习,对迭代器的类比
Analogy: An iterable is like a book (one can flip through the pages) and an iterator for a book would be a bookmark (saves the position and can locate the next page). Calling iter
on a book gives you a new bookmark independent of other bookmarks, but calling iter
on a bookmark gives you the bookmark itself, without changing its position at all. Calling next
on the bookmark moves it to the next page, but does not change the pages in the book. Calling next
on the book wouldn't make sense semantically. We can also have multiple bookmarks, all independent of each other.
生成器:懒人迭代器!
def test_iterator():
show_memory_info('initing iterator')
list_1 = [i for i in range(100000000)]
show_memory_info('after iterator initiated')
print(sum(list_1))
show_memory_info('after sum called')
def test_generator():
show_memory_info('initing generator')
list_2 = (i for i in range(100000000))
show_memory_info('after generator initiated')
print(sum(list_2))
show_memory_info('after sum called')
%time test_iterator()
%time test_generator()
########## 输出 ##########
initing iterator memory used: 48.9765625 MB
after iterator initiated memory used: 3920.30078125 MB
4999999950000000
after sum called memory used: 3920.3046875 MB
Wall time: 17 s
initing generator memory used: 50.359375 MB
after generator initiated memory used: 50.359375 MB
4999999950000000
after sum called memory used: 50.109375 MB
Wall time: 12.5 s
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
声明一个迭代器很简单,[i for i in range (100000000)] 就可以生成一个包含一亿元素的列表。每个元素在生成后都会保存到内存中,你通过代码可以看到,它们占用了巨量的内存,内存不够的话就会出现 OOM 错误。
🤔 了解下 yield()函数吧,他可以返回一个生成器对象,试试看懂这个
>>> def gen_list(lst):
... yield from lst
...
>>> g = gen_list([1, 2, 3, 4])
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
4
>>> next(g)
StopIteration
2
3
4
5
6
7
8
9
10
11
12
13
14
思考题:python 会显示什么?为什么?
>>> s = [1, 2, 3, 4]
>>> t = iter(s)
>>> next(s)
______
>>> next(t)
______
>>> next(t)
______
>>> iter(s)
______
>>> next(iter(s))
______
>>> next(iter(t))
______
>>> next(iter(s))
______
>>> next(iter(t))
______
>>> next(t)
______
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
>>> r = range(6)
>>> r_iter = iter(r)
>>> next(r_iter)
______
>>> [x + 1 for x in r]
______
>>> [x + 1 for x in r_iter]
______
>>> next(r_iter)
______
>>> list(range(-2, 4)) # Converts an iterable into a list
______
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
任务
P10:实现 count
,它接受一个迭代器 t
并返回该值 x
出现在的前 n 个元素中的次数 t
def count(t, n, x):
"""Return the number of times that x appears in the first n elements of iterator t.
>>> s = iter([10, 9, 10, 9, 9, 10, 8, 8, 8, 7])
>>> count(s, 10, 9)
3
>>> s2 = iter([10, 9, 10, 9, 9, 10, 8, 8, 8, 7])
>>> count(s2, 3, 10)
2
>>> s = iter([3, 2, 2, 2, 1, 2, 1, 4, 4, 5, 5, 5])
>>> count(s, 1, 3)
1
>>> count(s, 4, 2)
3
>>> next(s)
2
>>> s2 = iter([4, 1, 6, 6, 7, 7, 8, 8, 2, 2, 2, 5])
>>> count(s2, 6, 6)
2
"""
"*** YOUR CODE HERE ***"
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
P11: 实现生成器函数 scale(it, multiplier)
,它产生给定迭代的元素 it
,按 multiplier
.
同时也希望你尝试使用 yield from
语句编写这个函数!
def scale(it, multiplier):
"""Yield elements of the iterable it scaled by a number multiplier.
>>> m = scale(iter([1, 5, 2]), 5)
>>> type(m)
<class 'generator'>
>>> list(m)
[5, 25, 10]
>>> # generators allow us to represent infinite sequences!!!
>>> def naturals():
... i = 0
... while True:
... yield i
... i += 1
>>> m = scale(naturals(), 2)
>>> [next(m) for _ in range(5)]
[0, 2, 4, 6, 8]
"""
"*** YOUR CODE HERE ***"
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19