Sự cố đa xử lý Python

Nội dung chính Show

Tại sao đa xử lý lại chậm Python?
Khi nào tôi nên sử dụng đa xử lý trong Python?
Đa xử lý có làm cho Python nhanh hơn không?
Quá trình xếp hàng đa xử lý có an toàn không?

> It appears the `multiprocessing`'s "spawn" mode doesn't actually use POSIX spawn, but instead uses fork+exec[1].

The documentation doesn't pretend to use posix_spawn(). It only says: "starts a fresh python interpreter process".
https://docs.python.org/dev/library/multiprocessing.html#contexts-and-start-methods

I suggest to close the issue as "not a bug". I don't see anything wrong in the current documentation.

--

posix_spawn() is a function of the C library. It is implemented as fork+exec on most operating systems. I'm only aware of macOS which has a dedicated syscall. Well, posix_spawn() implementation is usually faster thanks to some optimizations.

Python has os.posix_spawn() since Python 3.8.

The subprocess can use os.posix_spawn() on Linux under some conditions:
https://docs.python.org/dev/whatsnew/3.8.html#optimizations

Sadly, it's not used by default, since close_fds=True remains subprocess.Popen() default.

I'm open to use it on more platforms. os.posix_spawn() can only be used if it reports properly errors to the parent process, and some other things and bugs. It's a complex function!

--

Oh, about multiprocessing. Well, someone has to propose a patch! I don't know why multiprocessing uses directly _posixsubprocess.fork_exec() rather than the subprocess module. It's also a complex module with many specific constraints.

posix_spawn() looks nice, but it cannot be used in many cases :-(

Similar issue as the previous issue 26903.

```
Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> multiprocessing.cpu_count()
64
>>> multiprocessing.Pool(multiprocessing.cpu_count())
Exception in thread Thread-1:
Traceback (most recent call last):

  File "C:\Users\sasch\AppData\Local\Programs\Python\Python39\lib\threading.py", line 973, in _bootstrap_inner
>>>     self.run()
  File "C:\Users\sasch\AppData\Local\Programs\Python\Python39\lib\threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\sasch\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 519, in _handle_workers
    cls._wait_for_updates(current_sentinels, change_notifier)
  File "C:\Users\sasch\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 499, in _wait_for_updates
    wait(sentinels, timeout=timeout)
  File "C:\Users\sasch\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 884, in wait
    ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
  File "C:\Users\sasch\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)
ValueError: need at most 63 handles, got a sequence of length 66
```

While writing a program using the multiprocessing library I stumbled upon what appears to be a bug with how different platforms deal with private methods.

When a class has a private method which is the target for a multiprocessing process, this name is correctly resolved on Linux (20.04.1-Ubuntu running Python 3.8.10) but fails to be resolved correctly on MacOS (Python 3.8.2 and 3.8.8) or Windows 10 (Python 3.9.6).


import multiprocessing

class Test(object):
    def __init__(self):
        self.a = 1
        self._b = 2
        self.__c = 3
        self.run1()
        self.run2()
    def _test1(self, conn):
        conn.send(self._b)
    def __test2(self, conn):
        conn.send(self.__c)
    def run1(self):
        print("Running self._test1()")
        parent, child = multiprocessing.Pipe()
        process = multiprocessing.Process(target=self._test1, args=(child, ))
        process.start()
        print(parent.recv())
        process.join()
    def run2(self):
        print("Running self.__test2()")
        parent, child = multiprocessing.Pipe()
        process = multiprocessing.Process(target=self.__test2, args=(child, ))
        process.start()
        print(parent.recv())
        process.join()

if __name__ == "__main__":
    t = Test()


On Linux, this has the intended behavior of printing:
Running self._test1()
2
Running self.__test2()
3

However, on Windows 10, this results in an Exception being raised:
Running self._test1()
2
Running self.__test2()
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Users\\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: 'Test' object has no attribute '__test2'

A similar exception is also raised on MacOS for this code.


It would therefore appear that there is different behavior for resolving class attributes starting with `__` on different platforms (at least within multiprocessing). It is my understanding that because multiprocessing.Process is called within the class, the private method should be within scope and so should resolve correctly.
I'm aware that Python doesn't have strict private methods, and instead renames them (Test.__test2 becomes Test._Test__test2) - explaining why on Windows it cannot find the attribute with that name. 

My question really is, which platform is correct here, and is the inconsistency intentional? I'd suggest Linux is most correct here as the process is spawned from within the object so the method should be in scope, but either way, the inconsistency between platforms may cause some unintended issues.

Tại sao đa xử lý lại chậm Python?

Phiên bản đa xử lý chậm hơn vì phiên bản này cần tải lại mô hình trong mỗi lệnh gọi bản đồ vì các chức năng được ánh xạ được coi là không trạng thái . Phiên bản đa xử lý trông như sau. Lưu ý rằng trong một số trường hợp, có thể đạt được điều này bằng cách sử dụng đối số trình khởi tạo cho đa xử lý.

Khi nào tôi nên sử dụng đa xử lý trong Python?

Nhóm đa xử lý Python có thể được sử dụng để thực thi song song một hàm trên nhiều giá trị đầu vào , phân phối dữ liệu đầu vào trên các quy trình (song song hóa dữ liệu).

Đa xử lý có làm cho Python nhanh hơn không?

Vì vậy, đa xử lý sẽ nhanh hơn khi chương trình được giới hạn bởi CPU . Trong trường hợp có nhiều I/O trong chương trình của bạn, phân luồng có thể hiệu quả hơn vì hầu hết thời gian, chương trình của bạn đang đợi I/O hoàn thành. Tuy nhiên, đa xử lý thường hiệu quả hơn vì nó chạy đồng thời.

Quá trình xếp hàng đa xử lý có an toàn không?

Điều này bao gồm hàng đợi trong đa xử lý. Hàng đợi là luồng và xử lý an toàn . Điều này có nghĩa là các quy trình có thể nhận () và đặt () các mục từ và vào hàng đợi đồng thời mà không sợ điều kiện chạy đua.

programming python

Sự cố đa xử lý Python

Tại sao đa xử lý lại chậm Python?

Khi nào tôi nên sử dụng đa xử lý trong Python?

Đa xử lý có làm cho Python nhanh hơn không?

Quá trình xếp hàng đa xử lý có an toàn không?

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội