用於資料科學的幾種Python裝飾器介紹 - Bytepawn
from sympy import isprime def generate_primes(domain: int=1000*1000, num_attempts: int=1000) -> list[int]: primes: set[int] = set() seed(time()) for _ in range(num_attempts): candidate: int = randint(4, domain) if isprime(candidate): primes.add(candidate) return sorted(primes) print(len(generate_primes())) |
然後我意識到,如果我在所有的CPU執行緒上並行執行原來的generate_primes(),我可以得到一個 "免費 "的加速。這是很常見的,定義一個@parallel用法:
def parallel(func=None, args=(), merge_func=lambda x:x, parallelism = cpu_count()): def decorator(func: Callable): def inner(*args, **kwargs): results = Parallel(n_jobs=parallelism)(delayed(func)(*args, **kwargs) for i in range(parallelism)) return merge_func(results) return inner if func is None: # decorator was used like @parallel(...) return decorator else: # decorator was used like @parallel, without parens return decorator(func) |
@parallel(merge_func=lambda li: sorted(set(chain(*li)))) def generate_primes(...): # same signature, nothing changes ... # same code, nothing changes print(len(generate_primes())) |
唯一的開銷是必須定義一個merge_func,它將函式的不同執行結果合併為一個結果,以便向裝飾函式(本例中為 generate_primes())的外部呼叫者隱藏並行性。在這個玩具例子中,我只是合併了列表,並透過使用 set() 確保素數是唯一的。
這個例子使用了joblib.Parallel()的程式並行,它在Darwin + python3 + ipython上執行良好,並且避免了對Python全域性直譯器鎖(GIL)的鎖定。
production_servers = [...] def production(func: Callable): def inner(*args, **kwargs): if gethostname() in production_servers: return func(*args, **kwargs) else: print('This host is not a production server, skipping function decorated with @production...') return inner def development(func: Callable): def inner(*args, **kwargs): if gethostname() not in production_servers: return func(*args, **kwargs) else: print('This host is a production server, skipping function decorated with @development...') return inner def inactive(func: Callable): def inner(*args, **kwargs): print('Skipping function decorated with @inactive...') return inner @production def foo(): print('Running in production, touching databases!') foo() @development def foo(): print('Running in production, touching databases!') foo() @inactive def foo(): print('Running in production, touching databases!') foo() |
Running in production, touching databases! This host is a production server, skipping function decorated with @development... Skipping function decorated with @inactive... |
最常用的是dag_vertica_create_table_as(),它在我們的Vertica DWH上執行一個SELECT,每晚將結果轉儲到一個表中。
dag = dag_vertica_create_table_as( table='my_aggregate_table', owner='Marton Trencseni (marton.trencseni@maf.ae)', schedule_interval='@daily', ... select=""" SELECT ... FROM ... """ ) |
CREATE TABLE my_aggregate_table AS SELECT ... |
在過去的兩年裡,我們已經建立了近500個DAG,所以我們擴大了Airflow EC2例項的規模,並引入了獨立的開發和生產環境。如果能有一種方法來標記DAG是應該在開發環境還是生產環境中執行,在程式碼/Github中跟蹤這一點,並使用相同的機制來確保DAG不會意外地執行在錯誤的環境中,那就更好了。
def deployable(func): def inner(*args, **kwargs): if 'deploy' in kwargs: if kwargs['deploy'].lower() in ['production', 'prod'] and gethostname() not in production_servers: print('This host is not a production server, skipping...') return if kwargs['deploy'].lower() in ['development', 'dev'] and gethostname() not in development_servers: print('This host is not a development server, skipping...') return if kwargs['deploy'].lower() in ['skip', 'none']: print('Skipping...') return del kwargs['deploy'] # to avoid func() throwing an unexpected keyword exception return func(*args, **kwargs) return inner |
@deployable def dag_vertica_create_table_as(...): # same signature, nothing changes ... # code signature, nothing changes @deployable def dag_vertica_create_or_replace_view_as(...): # same signature, nothing changes ... # code signature, nothing changes @deployable def dag_vertica_train_predict_model(...): # same signature, nothing changes ... # code signature, nothing changes |
dag = dag_vertica_create_table_as( deploy='development', # the function will return None on production ... ) |
@redirect (stdout)
def redirect(func=None, line_print: Callable = None): def decorator(func: Callable): def inner(*args, **kwargs): with StringIO() as buf, redirect_stdout(buf): func(*args, **kwargs) output = buf.getvalue() lines = output.splitlines() if line_print is not None: for line in lines: line_print(line) else: width = floor(log(len(lines), 10)) + 1 for i, line in enumerate(lines): i += 1 print(f'{i:0{width}}: {line}') return inner if func is None: # decorator was used like @redirect(...) return decorator else: # decorator was used like @redirect, without parens return decorator(func) |
@redirect def print_lines(num_lines): for i in range(num_lines): print(f'Line #{i+1}') print_lines(10) Output: 01: Line #1 02: Line #2 03: Line #3 04: Line #4 05: Line #5 06: Line #6 07: Line #7 08: Line #8 09: Line #9 10: Line #10 |
lines = [] def save_lines(line): lines.append(line) @redirect(line_print=save_lines) def print_lines(num_lines): for i in range(num_lines): print(f'Line #{i+1}') print_lines(3) print(lines) Output: <p class="indent">['Line #1', 'Line #2', 'Line #3'] |
def stacktrace(func=None, exclude_files=['anaconda']): def tracer_func(frame, event, arg): co = frame.f_code func_name = co.co_name caller_filename = frame.f_back.f_code.co_filename if func_name == 'write': return # ignore write() calls from print statements for file in exclude_files: if file in caller_filename: return # ignore in ipython notebooks args = str(tuple([frame.f_locals[arg] for arg in frame.f_code.co_varnames])) if args.endswith(',)'): args = args[:-2] + ')' if event == 'call': print(f'--> Executing: {func_name}{args}') return tracer_func elif event == 'return': print(f'--> Returning: {func_name}{args} -> {repr(arg)}') return def decorator(func: Callable): def inner(*args, **kwargs): settrace(tracer_func) func(*args, **kwargs) settrace(None) return inner if func is None: # decorator was used like @stacktrace(...) return decorator else: # decorator was used like @stacktrace, without parens return decorator(func) |
def b(): print('...') @stacktrace def a(arg): print(arg) b() return 'world' a('foo') Output: --> Executing: a('foo') foo --> Executing: b() ... --> Returning: b() -> None --> Returning: a('foo') -> 'world' |
def traceclass(cls: type): def make_traced(cls: type, method_name: str, method: Callable): def traced_method(*args, **kwargs): print(f'--> Executing: {cls.__name__}::{method_name}()') return method(*args, **kwargs) return traced_method for name in cls.__dict__.keys(): if callable(getattr(cls, name)) and name != '__class__': setattr(cls, name, make_traced(cls, name, getattr(cls, name))) return cls 使用: @traceclass class Foo: i: int = 0 def __init__(self, i: int = 0): self.i = i def increment(self): self.i += 1 def __str__(self): return f'This is a {self.__class__.__name__} object with i = {self.i}' f1 = Foo() f2 = Foo(4) f1.increment() print(f1) print(f2) Output: --> Executing: Foo::__init__() --> Executing: Foo::__init__() --> Executing: Foo::increment() --> Executing: Foo::__str__() This is a Foo object with i = 1 --> Executing: Foo::__str__() This is a Foo object with i = 4 |
- python幾種裝飾器的用法Python
- python中的裝飾器介紹Python
- 資料科學領域的幾個無程式碼分析工具介紹資料科學
- python裝飾器decorator的應用Python
- 3 個用於資料科學的頂級 Python 庫資料科學Python
- Orchest是用於資料科學的基於瀏覽器的IDE資料科學瀏覽器IDE
- python 裝飾器小白學習Python
- 設計模式:裝飾者模式介紹及程式碼示例 && JDK裡關於裝飾者模式的應用設計模式JDK
- python的裝飾器Python
- python裝飾器2:類裝飾器Python
- 用 Python 入門資料科學Python資料科學
- python應用資料科學的優勢Python資料科學
- Python 裝飾器你也會用Python
- oracle資料庫透過sqlplus連線的幾種方式介紹Oracle資料庫SQL
- Python學習筆記 - 裝飾器Python筆記
- Python 裝飾器初學筆記Python筆記
- Python 裝飾器Python
- Python裝飾器Python
- Python中的裝飾器Python
- 我終於弄懂了Python的裝飾器(二)Python
- 我終於弄懂了Python的裝飾器(一)Python
- 我終於弄懂了Python的裝飾器(三)Python
- 關於 Python 裝飾器的一些理解Python
- 裝飾器與後設資料反射(1)方法裝飾器反射
- 幾種新DevOps工具介紹dev
- [Python小記] 裝飾器怎麼用 ?Python
- python的裝飾器@的用法Python
- python基礎學習12—-裝飾器Python
- Python提高:關於閉包和裝飾器Python
- Untiy 中的幾個資料夾的介紹
- Java可以用於機器學習和資料科學嗎? - kdnuggetsJava機器學習資料科學
- 16個用於資料科學和機器學習的頂級平臺資料科學機器學習
- 1.5.3 Python裝飾器Python
- Python 裝飾器(一)Python
- Python 裝飾器原理Python
- Python裝飾器模式Python模式
- 聯發科MT8516處理器功能資料介紹
- 裝飾器與後設資料反射(3)引數裝飾器反射