# 阶段五:迭代生成 ## 前言 在写乘法表的时候,你可能写过类似 ```python for i in [2, 3, 5, 7, 11, 13]: print(i) ``` 这样的语句。for in 语句理解起来很直观形象,比起 C++ 和 java 早期的 ```c for (int i = 0; i < n; i ++) ``` 这样的语句,不知道简洁清晰到哪里去了。 但是,你想过 Python 在处理 for in 语句的时候,具体发生了什么吗?什么样的对象可以被 for in 来枚举呢? ## 容器迭代 容器这个概念非常好理解。 在 Python 中一切皆对象,对象的抽象就是类,而对象的集合就是容器。 列表`list: [0, 1, 2]`,元组`tuple: (0, 1, 2)`,字典`dict: {0:0, 1:1, 2:2}`,集合`set: set([0, 1, 2])`都是容器。 对于容器,你可以很直观地想象成多个元素在一起的单元;而不同容器的区别,正是在于内部数据结构的实现方法。 然后,你就可以针对不同场景,选择不同时间和空间复杂度的容器。所有的容器都是可迭代的(iterable)。 ```python iterator = iter(iterable) while True: elem = next(iterator) # do something ``` - 首先,在可迭代对象上调用内置 `iter` 函数以创建对应的*迭代器*。 - 要获取序列中的下一个元素,在此迭代器上调用内置 `next` 函数。 如果没有下一个元素了,怎么办? 我们需要对这种异常进行处理。 思考题:什么是异常处理,为什么要进行异常处理?有什么好处? 多次调用 `iter` 可迭代对象每次都会返回一个具有不同状态的新迭代器 你也可以调用 `iter` 迭代器本身,它只会返回相同的迭代器而不改变它的状态。但是,请注意,您不能直接在可迭代对象上调用 next。 ```python >>> lst = [1, 2, 3, 4] >>> next(lst) # Calling next on an iterable TypeError: 'list' object is not an iterator >>> list_iter = iter(lst) # Creates an iterator for the list >>> list_iter >>> next(list_iter) # Calling next on an iterator 1 >>> next(list_iter) # Calling next on the same iterator 2 >>> next(iter(list_iter)) # Calling iter on an iterator returns itself 3 >>> list_iter2 = iter(lst) >>> next(list_iter2) # Second iterator has new state 1 >>> next(list_iter) # First iterator is unaffected by second iterator 4 >>> next(list_iter) # No elements left! StopIteration >>> lst # Original iterable is unaffected [1, 2, 3, 4] ``` ## 英语练习,对迭代器的类比 **Analogy**: An iterable is like a book (one can flip through the pages) and an iterator for a book would be a bookmark (saves the position and can locate the next page). Calling `iter` on a book gives you a new bookmark independent of other bookmarks, but calling `iter` on a bookmark gives you the bookmark itself, without changing its position at all. Calling `next` on the bookmark moves it to the next page, but does not change the pages in the book. Calling `next` on the book wouldn't make sense semantically. We can also have multiple bookmarks, all independent of each other. ## 生成器:懒人迭代器! ```python def test_iterator(): show_memory_info('initing iterator') list_1 = [i for i in range(100000000)] show_memory_info('after iterator initiated') print(sum(list_1)) show_memory_info('after sum called') def test_generator(): show_memory_info('initing generator') list_2 = (i for i in range(100000000)) show_memory_info('after generator initiated') print(sum(list_2)) show_memory_info('after sum called') %time test_iterator() %time test_generator() ########## 输出 ########## initing iterator memory used: 48.9765625 MB after iterator initiated memory used: 3920.30078125 MB 4999999950000000 after sum called memory used: 3920.3046875 MB Wall time: 17 s initing generator memory used: 50.359375 MB after generator initiated memory used: 50.359375 MB 4999999950000000 after sum called memory used: 50.109375 MB Wall time: 12.5 s ``` 声明一个迭代器很简单,[i for i in range(100000000)]就可以生成一个包含一亿元素的列表。每个元素在生成后都会保存到内存中,你通过代码可以看到,它们占用了巨量的内存,内存不够的话就会出现 OOM 错误。 ::: warning 🤔 了解下 yield()函数吧,他可以返回一个生成器对象,试试看懂这个 ::: ```python >>> def gen_list(lst): ... yield from lst ... >>> g = gen_list([1, 2, 3, 4]) >>> next(g) 1 >>> next(g) 2 >>> next(g) 3 >>> next(g) 4 >>> next(g) StopIteration ``` ## 思考题:python 会显示什么?为什么? ```python >>> s = [1, 2, 3, 4] >>> t = iter(s) >>> next(s) ______ >>> next(t) ______ >>> next(t) ______ >>> iter(s) ______ >>> next(iter(s)) ______ >>> next(iter(t)) ______ >>> next(iter(s)) ______ >>> next(iter(t)) ______ >>> next(t) ______ ``` ```python >>> r = range(6) >>> r_iter = iter(r) >>> next(r_iter) ______ >>> [x + 1 for x in r] ______ >>> [x + 1 for x in r_iter] ______ >>> next(r_iter) ______ >>> list(range(-2, 4)) # Converts an iterable into a list ______ ``` ## 任务 P10:实现 `count`,它接受一个迭代器 `t` 并返回该值 `x` 出现在的前 n 个元素中的次数 `t` ```python def count(t, n, x): """Return the number of times that x appears in the first n elements of iterator t. >>> s = iter([10, 9, 10, 9, 9, 10, 8, 8, 8, 7]) >>> count(s, 10, 9) 3 >>> s2 = iter([10, 9, 10, 9, 9, 10, 8, 8, 8, 7]) >>> count(s2, 3, 10) 2 >>> s = iter([3, 2, 2, 2, 1, 2, 1, 4, 4, 5, 5, 5]) >>> count(s, 1, 3) 1 >>> count(s, 4, 2) 3 >>> next(s) 2 >>> s2 = iter([4, 1, 6, 6, 7, 7, 8, 8, 2, 2, 2, 5]) >>> count(s2, 6, 6) 2 """ "*** YOUR CODE HERE ***" ``` P11:实现生成器函数 `scale(it, multiplier)`,它产生给定迭代的元素 `it`,按 `multiplier`. 同时也希望你尝试使用 `yield from` 语句编写这个函数! ```python def scale(it, multiplier): """Yield elements of the iterable it scaled by a number multiplier. >>> m = scale(iter([1, 5, 2]), 5) >>> type(m) >>> list(m) [5, 25, 10] >>> # generators allow us to represent infinite sequences!!! >>> def naturals(): ... i = 0 ... while True: ... yield i ... i += 1 >>> m = scale(naturals(), 2) >>> [next(m) for _ in range(5)] [0, 2, 4, 6, 8] """ "*** YOUR CODE HERE ***" ```