Ipython的一些高階用法(一)

發表於2016-01-27

前言

以前在我的PPTpython高階程式設計也提到了一些關於ipython的用法. 今天繼續由淺入深的看看ipython, 本文作為讀者的你已經知道ipython並且用了一段時間了.

%run

這是一個magic命令, 能把你的指令碼里面的程式碼執行, 並且把對應的執行結果存入ipython的環境變數中:

$cat t.py
# coding=utf-8
l = range(5)

$ipython
In [1]: %run t.py # `%`可加可不加

In [2]: l # 這個l本來是t.py裡面的變數, 這裡直接可以使用了
Out[2]: [0, 1, 2, 3, 4]

$cat t.py

# coding=utf-8

l = range(5)

$ipython

In [1]: %run t.py # `%`可加可不加

In [2]: l # 這個l本來是t.py裡面的變數, 這裡直接可以使用了

Out[2]: [0, 1, 2, 3, 4]

alias

In [3]: %alias largest ls -1sSh | grep %s
In [4]: largest to
total 42M
 20K tokenize.py
 16K tokenize.pyc
8.0K story.html
4.0K autopep8
4.0K autopep8.bak
4.0K story_layout.html

In [3]: %alias largest ls -1sSh | grep %s

In [4]: largest to

total 42M

20K tokenize.py

16K tokenize.pyc

8.0K story.html

4.0K autopep8

4.0K autopep8.bak

4.0K story_layout.html

PS 別名需要儲存的, 否則重啟ipython就不存在了:

In [5]: %store largest
Alias stored: largest (ls -1sSh | grep %s)

1 2	In [5]: %store largest Alias stored: largest (ls -1sSh \| grep %s)

下次進入的時候%store -r

bookmark – 對目錄做別名

In [2]: %pwd
Out[2]: u'/home/vagrant'

In [3]: %bookmark dongxi ~/shire/dongxi

In [4]: %cd dongxi
/home/vagrant/shire/dongxi_code

In [5]: %pwd
Out[5]: u'/home/vagrant/shire/dongxi_code'

In [2]: %pwd

Out[2]: u'/home/vagrant'

In [3]: %bookmark dongxi ~/shire/dongxi

In [4]: %cd dongxi

/home/vagrant/shire/dongxi_code

In [5]: %pwd

Out[5]: u'/home/vagrant/shire/dongxi_code'

ipcluster – 平行計算

其實ipython提供的方便的平行計算的功能. 先回答ipython做平行計算的特點:

1. $wget http://www.gutenberg.org/files/27287/27287-0.txt

第一個版本是直接的, 大家習慣的用法.

In [1]: import re

In [2]: import io

In [3]: non_word = re.compile(r'[\W\d]+', re.UNICODE)

In [4]: common_words = {
   ...: 'the','of','and','in','to','a','is','it','that','which','as','on','by',
   ...: 'be','this','with','are','from','will','at','you','not','for','no','have',
   ...: 'i','or','if','his','its','they','but','their','one','all','he','when',
   ...: 'than','so','these','them','may','see','other','was','has','an','there',
   ...: 'more','we','footnote', 'who', 'had', 'been',  'she', 'do', 'what',
   ...: 'her', 'him', 'my', 'me', 'would', 'could', 'said', 'am', 'were', 'very',
   ...: 'your', 'did', 'not',
   ...: }

In [5]: def yield_words(filename):
   ...:     import io
   ...:     with io.open(filename, encoding='latin-1') as f:
   ...:         for line in f:
   ...:             for word in line.split():
   ...:                 word = non_word.sub('', word.lower())
   ...:                 if word and word not in common_words:
   ...:                     yield word
   ...:

In [6]: def word_count(filename):
   ...:     word_iterator = yield_words(filename)
   ...:     counts = {}
   ...:     counts = defaultdict(int)
   ...:     while True:
   ...:         try:
   ...:             word = next(word_iterator)
   ...:         except StopIteration:
   ...:             break
   ...:         else:
   ...:             counts[word] += 1
   ...:     return counts
   ...:

In [6]: from collections import defaultdict # 腦殘了 忘記放進去了..
In [7]: %time counts = word_count(filename)
CPU times: user 88.5 ms, sys: 2.48 ms, total: 91 ms
Wall time: 89.3 ms

In [1]: import re

In [2]: import io

In [3]: non_word = re.compile(r'[\W\d]+', re.UNICODE)

In [4]: common_words = {

...: 'the','of','and','in','to','a','is','it','that','which','as','on','by',

...: 'be','this','with','are','from','will','at','you','not','for','no','have',

...: 'i','or','if','his','its','they','but','their','one','all','he','when',

...: 'than','so','these','them','may','see','other','was','has','an','there',

...: 'more','we','footnote', 'who', 'had', 'been', 'she', 'do', 'what',

...: 'her', 'him', 'my', 'me', 'would', 'could', 'said', 'am', 'were', 'very',

...: 'your', 'did', 'not',

...: }

In [5]: def yield_words(filename):

...: import io

...: with io.open(filename, encoding='latin-1') as f:

...: for line in f:

...: for word in line.split():

...: word = non_word.sub('', word.lower())

...: if word and word not in common_words:

...: yield word

...:

In [6]: def word_count(filename):

...: word_iterator = yield_words(filename)

...: counts = {}

...: counts = defaultdict(int)

...: while True:

...: try:

...: word = next(word_iterator)

...: except StopIteration:

...: break

...: else:

...: counts[word] += 1

...: return counts

...:

In [6]: from collections import defaultdict # 腦殘了忘記放進去了..

In [7]: %time counts = word_count(filename)

CPU times: user 88.5 ms, sys: 2.48 ms, total: 91 ms

Wall time: 89.3 ms

現在用ipython來跑一下:

ipcluster start -n 2 # 好吧, 我的Mac是雙核的

1	ipcluster start -n 2 # 好吧, 我的Mac是雙核的

先講下ipython 平行計算的用法:

In [1]: from IPython.parallel import Client # import之後才能用%px*的magic

In [2]: rc = Client()

In [3]: rc.ids # 因為我啟動了2個程式
Out[3]: [0, 1]

In [4]: %autopx # 如果不自動 每句都需要: `%px xxx`
%autopx enabled

In [5]: import os # 這裡沒autopx的話 需要: `%px import os`

In [6]: print os.getpid() # 2個程式的pid
[stdout:0] 62638
[stdout:1] 62636

In [7]: %pxconfig --targets 1 # 在autopx下 這個magic不可用
[stderr:0] ERROR: Line magic function `%pxconfig` not found.
[stderr:1] ERROR: Line magic function `%pxconfig` not found.

In [8]: %autopx # 再執行一次就會關閉autopx
%autopx disabled

In [10]: %pxconfig --targets 1 # 指定目標物件, 這樣下面執行的程式碼就會只在第2個程式下執行

In [11]: %%px --noblock # 其實就是執行一段非阻塞的程式碼
   ....: import time
   ....: time.sleep(1)
   ....: os.getpid()
   ....:
Out[11]: <AsyncResult: execute>

In [12]: %pxresult # 看 只返回了第二個程式的pid
Out[1:21]: 62636

In [13]: v = rc[:] # 使用全部的程式, ipython可以細粒度的控制那個engine執行的內容

In [14]: with v.sync_imports(): # 每個程式都匯入time模組
   ....:     import time
   ....:
importing time on engine(s)

In [15]: def f(x):
   ....:     time.sleep(1)
   ....:     return x * x
   ....:

In [16]: v.map_sync(f, range(10)) # 同步的執行

Out[16]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [17]: r = v.map(f, range(10)) # 非同步的執行

In [18]: r.ready(), r.elapsed # celery的用法
Out[18]: (True, 5.87735)

In [19]: r.get() # 獲得執行的結果
Out[19]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [1]: from IPython.parallel import Client # import之後才能用%px*的magic

In [2]: rc = Client()

In [3]: rc.ids # 因為我啟動了2個程式

Out[3]: [0, 1]

In [4]: %autopx # 如果不自動每句都需要: `%px xxx`

%autopx enabled

In [5]: import os # 這裡沒autopx的話需要: `%px import os`

In [6]: print os.getpid() # 2個程式的pid

[stdout:0] 62638

[stdout:1] 62636

In [7]: %pxconfig --targets 1 # 在autopx下這個magic不可用

[stderr:0] ERROR: Line magic function `%pxconfig` not found.

[stderr:1] ERROR: Line magic function `%pxconfig` not found.

In [8]: %autopx # 再執行一次就會關閉autopx

%autopx disabled

In [10]: %pxconfig --targets 1 # 指定目標物件, 這樣下面執行的程式碼就會只在第2個程式下執行

In [11]: %%px --noblock # 其實就是執行一段非阻塞的程式碼

....: import time

....: time.sleep(1)

....: os.getpid()

....:

Out[11]: <AsyncResult: execute>

In [12]: %pxresult # 看只返回了第二個程式的pid

Out[1:21]: 62636

In [13]: v = rc[:] # 使用全部的程式, ipython可以細粒度的控制那個engine執行的內容

In [14]: with v.sync_imports(): # 每個程式都匯入time模組

....: import time

....:

importing time on engine(s)

In [15]: def f(x):

....: time.sleep(1)

....: return x * x

....:

In [16]: v.map_sync(f, range(10)) # 同步的執行

Out[16]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [17]: r = v.map(f, range(10)) # 非同步的執行

In [18]: r.ready(), r.elapsed # celery的用法

Out[18]: (True, 5.87735)

In [19]: r.get() # 獲得執行的結果

Out[19]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

入正題:

In [20]: def split_text(filename):
....:    text = open(filename).read()
....:    lines = text.splitlines()
....:    nlines = len(lines)
....:    n = 10
....:    block = nlines//n
....:    for i in range(n):
....:        chunk = lines[i*block:(i+1)*(block)]
....:        with open('count_file%i.txt' % i, 'w') as f:
....:            f.write('\n'.join(chunk))
....:    cwd = os.path.abspath(os.getcwd())
....:    fnames = [ os.path.join(cwd, 'count_file%i.txt' % i) for i in range(n)] # 不用glob是為了精準
....:    return fnames

In [21]: from IPython import parallel

In [22]: rc = parallel.Client()

In [23]: view = rc.load_balanced_view()

In [24]: v = rc[:]

In [25]: v.push(dict(
   ....:     non_word=non_word,
   ....:     yield_words=yield_words,
   ....:     common_words=common_words
   ....: ))
Out[25]: <AsyncResult: _push>

In [26]: fnames = split_text(filename)

In [27]: def count_parallel():
   .....:     pcounts = view.map(word_count, fnames)
   .....:     counts = defaultdict(int)
   .....:     for pcount in pcounts.get():
   .....:         for k, v in pcount.iteritems():
   .....:             counts[k] += v
   .....:     return counts, pcounts
   .....:

In [28]: %time counts, pcounts = count_parallel() # 這個時間包含了我再聚合的時間
CPU times: user 47.6 ms, sys: 6.67 ms, total: 54.3 ms # 是不是比直接執行少了很多時間?
Wall time: 106 ms # 這個時間是

In [29]: pcounts.elapsed, pcounts.serial_time, pcounts.wall_time
Out[29]: (0.104384, 0.13980499999999998, 0.104384)

In [20]: def split_text(filename):

....: text = open(filename).read()

....: lines = text.splitlines()

....: nlines = len(lines)

....: n = 10

....: block = nlines//n

....: for i in range(n):

....: chunk = lines[i*block:(i+1)*(block)]

....: with open('count_file%i.txt' % i, 'w') as f:

....: f.write('\n'.join(chunk))

....: cwd = os.path.abspath(os.getcwd())

....: fnames = [ os.path.join(cwd, 'count_file%i.txt' % i) for i in range(n)] # 不用glob是為了精準

....: return fnames

In [21]: from IPython import parallel

In [22]: rc = parallel.Client()

In [23]: view = rc.load_balanced_view()

In [24]: v = rc[:]

In [25]: v.push(dict(

....: non_word=non_word,

....: yield_words=yield_words,

....: common_words=common_words

....: ))

Out[25]: <AsyncResult: _push>

In [26]: fnames = split_text(filename)

In [27]: def count_parallel():

.....: pcounts = view.map(word_count, fnames)

.....: counts = defaultdict(int)

.....: for pcount in pcounts.get():

.....: for k, v in pcount.iteritems():

.....: counts[k] += v

.....: return counts, pcounts

.....:

In [28]: %time counts, pcounts = count_parallel() # 這個時間包含了我再聚合的時間

CPU times: user 47.6 ms, sys: 6.67 ms, total: 54.3 ms # 是不是比直接執行少了很多時間?

Wall time: 106 ms # 這個時間是

In [29]: pcounts.elapsed, pcounts.serial_time, pcounts.wall_time

Out[29]: (0.104384, 0.13980499999999998, 0.104384)

更多地關於平行計算請看這裡: Parallel Computing with IPython

typedef的一些高階用法
2019-05-12
Bash 指令碼程式設計的一些高階用法
2020-06-30
指令碼程式設計
Nacos的一些高階功能
2024-04-20
Nant的高階用法
2008-06-25
NaN
7章 RxJava高階用法（一）
2019-05-13
RxJava
CSS使用的一些小技巧/高階進階
2019-03-04
CSS
Nginx 高階用法
2020-10-08
Nginx
react高階元件的一些運用
2020-12-01
React元件
json格式的字串序列化和反序列化的一些高階用法
2020-03-15
JSON字串
使用 Google 高階搜尋的一些技巧
2020-04-04
Go
Python——迭代器的高階用法
2020-03-31
Python
Flutter 中漸變的高階用法
2020-07-05
Flutter
[轉] Input的高階用法11例
2008-05-06
hive常用的一些高階函式彙總
2023-05-04
Hive函式
ABAP Code Inspector 的一些高階功能分享
2022-05-22
MySql的一些用法
2016-12-31
MySql
java高階用法之:JNA中的Function
2022-05-06
JavaFunction
java高階用法之:JNA中的Structure
2022-05-09
JavaStruct
深入解析Vue中watch的高階用法
2018-11-26
Vue
flutter系列之:Navigator的高階用法
2023-02-27
Flutter
Vue.js中 watch 的高階用法
2018-05-02
Vue.js
詳解Vue中watch的高階用法
2018-05-03
Vue
Python進階：切片的誤區與高階用法
2018-12-29
Python
Python Django進階教程（三）（模型的高階用法）
2017-05-11
PythonDjango模型
關於高階複製的一些資料同步
2006-02-20
iOS Storyboard入門及一些高階使用
2018-02-01
iOS
PHP yield 高階用法——網路
2020-06-29
PHP
Shell-變數高階用法
2020-01-03
變數
8章 RxJava高階用法（二）
2019-05-13
RxJava
gojs 實用高階用法
2022-01-01
GoJS
Pandas高階教程之:GroupBy用法
2021-07-12
curl與wget高階用法
2018-09-30
wget
VIM高階用法筆記【轉】
2013-01-30
筆記
VI的一些最常用命令和一些高階的應用(轉載)
2008-02-28
ORACLE 11G dataguard的一些高階管理案例研究
2016-11-03
Oracle
關於一些nginx的高階擴充套件應用
2017-11-07
Nginx套件
【進階之路】執行緒池配置與調優的一些高階選項（一）
2020-12-08
執行緒
java高階用法之:JNA中的回撥
2022-05-10
Java

Ipython的一些高階用法(一)

前言

%run

alias

bookmark – 對目錄做別名

ipcluster – 平行計算

相關文章