PyPy是Python開發者為了更好的Hack Python建立的專案。此外,PyPy比CPython是更加靈活,易於使用和試驗,以制定具體的功能在不同情況的實現方法,可以很容易實施。該專案的目標是,讓PyPy比C實現的Python更為容易的適應各個專案和方便裁剪。
Python 社群一直有討論移除 GIL(Global Interpreter Lock) 的聲音,而且各解析器也有做各種嘗試去解決這個問題。Jython 和 IronPython 在底層平臺的幫助下已成功地將其移除,而像 gilectomy 、CPython 則還沒有結果。
PyPy 團隊 8 月 14 日發文表示,其團隊自今年的 February Sprint 後一直在進行移除 GIL 的各種試驗,希望能實現 IronPython 和 Jython 的效果(相比之下,他們認為在 CPython 中移除 GIL 會更難,因為還需要解決多執行緒引用計數的問題)。到目前為止,終於擁有了一個無 GIL 版的 PyPy ,它可以執行非常簡單的多執行緒、並行化的程式,但如果是更復雜的程式可能會出現故障。後續將針對此問題進行重點研究。
不過由於這樣的工作會使 PyPy 程式碼庫和團隊的日常工作複雜化,PyPy 團隊表示想判斷社群和商業夥伴(非個人捐贈)是否對該實現感興趣。如果他們能得到一個 10 萬美元的合同,他們將提供一個完整工作的 無 GIL PyPy 直譯器,並可能與預設的 PyPy 版本分開發行。他們隨後在文章中附上了具體的技術細節。
文章釋出後,引起了 Python 群體的熱議,有表示支援的,也有認為這其實就是在找投資,但沒有看到明顯的商業價值;還有人認為開發過程中其實可以忽略 GIL ,並不需要這麼麻煩。
英文原文如下:
Let's remove the Global Interpreter Lock
Hello everyone
The Python community has been discussing removing the Global Interpreter Lock for a long time. There have been various attempts at removing it: Jython or IronPython successfully removed it with the help of the underlying platform, and some have yet to bear fruit, like gilectomy. Since our February sprint in Leysin, we have experimented with the topic of GIL removal in the PyPy project. We believe that the work done in IronPython or Jython can be reproduced with only a bit more effort in PyPy. Compared to that, removing the GIL in CPython is a much harder topic, since it also requires tackling the problem of multi-threaded reference counting. See the section below for further details.
As we announced at EuroPython, what we have so far is a GIL-less PyPy which can run very simple multi-threaded, nicely parallelized, programs. At the moment, more complicated programs probably segfault. The remaining 90% (and another 90%) of work is with putting locks in strategic places so PyPy does not segfault during concurrent accesses to data structures.
Since such work would complicate the PyPy code base and our day-to-day work, we would like to judge the interest of the community and the commercial partners to make it happen (we are not looking for individual donations at this point). We estimate a total cost of $50k, out of which we already have backing for about 1/3 (with a possible 1/3 extra from the STM money, see below). This would give us a good shot at delivering a good proof-of-concept working PyPy with no GIL. If we can get a $100k contract, we will deliver a fully working PyPy interpreter with no GIL as a release, possibly separate from the default PyPy release.
People asked several questions, so I'll try to answer the technical parts here.
What would the plan entail?
We've already done the work on the Garbage Collector to allow doing multi- threaded programs in RPython. "All" that is left is adding locks on mutable data structures everywhere in the PyPy codebase. Since it would significantly complicate our workflow, we require real interest in that topic, backed up by commercial contracts in order to justify the added maintenance burden.
Why did the STM effort not work out?
STM was a research project that proved that the idea is possible. However, the amount of user effort that is required to make programs run in a parallelizable way is significant, and we never managed to develop tools that would help in doing so. At the moment we're not sure if more work spent on tooling would improve the situation or if the whole idea is really doomed. The approach also ended up adding significant overhead on single threaded programs, so in the end it is very easy to make your programs slower. (We have some money left in the donation pot for STM which we are not using; according to the rules, we could declare the STM attempt failed and channel that money towards the present GIL removal proposal.)
Wouldn't subinterpreters be a better idea?
Python is a very mutable language - there are tons of mutable state and basic objects (classes, functions,...) that are compile-time in other language but runtime and fully mutable in Python. In the end, sharing things between subinterpreters would be restricted to basic immutable data structures, which defeats the point. Subinterpreters suffers from the same problems as multiprocessing with no additional benefits. We believe that reducing mutability to implement subinterpreters is not viable without seriously impacting the semantics of the language (a conclusion which applies to many other approaches too).
Why is it easier to do in PyPy than CPython?
Removing the GIL in CPython has two problems:
how do we guard access to mutable data structures with locks and
what to do with reference counting that needs to be guarded.
PyPy only has the former problem; the latter doesn't exist, due to a different garbage collector approach. Of course the first problem is a mess too, but at least we are already half-way there. Compared to Jython or IronPython, PyPy lacks some data structures that are provided by JVM or .NET, which we would need to implement, hence the problem is a little harder than on an existing multithreaded platform. However, there is good research and we know how that problem can be solved.
Best regards,
Maciej Fijalkowski