Python 中 string 和 bytes 转换陷阱

Question

Python 中字符串和字节串转换时有哪些常见的陷阱？如何处理编码不匹配问题？。Python 面试题

Yahuda · Accepted Answer

常见陷阱： 隐式编码/解码错误： # Python 2 中 str 和 bytes 混用自动转换（导致问题） # Python 3 中严格区分，不会自动转换 s = 'hello' b = b'world' # s + b # TypeError: can only concatenate str (not "bytes") to str 编码声明不匹配： s = '中' b = s.encode('gbk') # b'\xd6\xd0' b.decode('utf-8') # UnicodeDecodeError! b.decode('gbk') # '中' BOM 标记：UTF-16/UTF-8 with BOM 文件有额外前缀 b'\xef\xbb\xbfhello'.decode('utf-8-sig') # 'hello'（自动去除 BOM） 非 ASCII 路径：Windows 文件系统可能使用不同的编码 最佳实践： 统一使用 UTF-8 在 I/O 边界进行编解码（尽早解码为 str，晚点编码为 bytes） 使用 open(..., encoding='utf-8')...

Python 中 string 和 bytes 转换陷阱

回答

Yahuda