本帖最后由 陈泽 于 2019-12-12 11:26 编辑
Apache Kafka分布式流处理系统的Python客户端。kafka-python的设计功能非常类似于官方的Java客户端,并带有大量的pythonic接口(例如,消费者迭代器)。 kafka-python最好与较新的代理(0.9+)一起使用,但与较旧的版本(至0.8.0)向后兼容。某些功能仅在较新的代理上启用。例如,完全协调的消费者组(即,将动态分区分配给同一组中的多个消费者)需要使用0.9+ kafka经纪人。要为较早的经纪人版本支持此功能,将需要编写和维护自定义领导者选举和成员资格/健康检查代码(可能使用Zookeeper或领事)。对于较旧的代理,您可以通过使用Chef,ansible等配置管理工具为每个使用者实例手动分配不同的分区来实现类似的目的。这种方法可以很好地工作,尽管它不支持对故障进行重新平衡。见<https://kafka-python.readthedocs.io/en/master/compatibility.html >了解更多详细信息。 请注意,master分支可能包含未发布的功能。有关发布文档,请参阅readthedocs和/或python的内联帮助。 [Shell] 纯文本查看 复制代码 >>> pip install kafka-python
KafkaConsumerKafkaConsumer是高级消息使用者,旨在与官方Java客户端尽可能类似地操作。要完全支持协调的消费者群体,需要使用支持组API的kafka经纪人:kafka v0.9 +。 有关API和配置的详细信息,请参见< https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html >。 消费者迭代器返回ConsumerRecords,它们是简单的namedtuple,显示基本消息属性:主题,分区,偏移量,键和值: [Shell] 纯文本查看 复制代码 >>> from kafka import KafkaConsumer
>>> consumer = KafkaConsumer('my_favorite_topic')
>>> for msg in consumer:
... print (msg) [Shell] 纯文本查看 复制代码 >>> # join a consumer group for dynamic partition assignment and offset commits
>>> from kafka import KafkaConsumer
>>> consumer = KafkaConsumer('my_favorite_topic', group_id='my_favorite_group')
>>> for msg in consumer:
... print (msg)
[Shell] 纯文本查看 复制代码 >>> # manually assign the partition list for the consumer
>>> from kafka import TopicPartition
>>> consumer = KafkaConsumer(bootstrap_servers='localhost:1234')
>>> consumer.assign([TopicPartition('foobar', 2)])
>>> msg = next(consumer) [AppleScript] 纯文本查看 复制代码 >>> # Deserialize msgpack-encoded values
>>> consumer = KafkaConsumer(value_deserializer=msgpack.loads)
>>> consumer.subscribe(['msgpackfoo'])
>>> for msg in consumer:
... assert isinstance(msg.value, dict)
[Shell] 纯文本查看 复制代码 >>> # Access record headers. The returned value is a list of tuples
>>> # with str, bytes for key and value
>>> for msg in consumer:
... print (msg.headers) [Shell] 纯文本查看 复制代码 >>> # Get consumer metrics
>>> metrics = consumer.metrics()
KafkaProducerKafkaProducer是高级异步消息生成器。该类旨在与官方Java客户端尽可能类似地运行。有关更多详细信息,请参见< https://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html >。 [Shell] 纯文本查看 复制代码 >>> from kafka import KafkaProducer
>>> producer = KafkaProducer(bootstrap_servers='localhost:1234')
>>> for _ in range(100):
... producer.send('foobar', b'some_message_bytes') [Shell] 纯文本查看 复制代码 >>> # Block until a single message is sent (or timeout)[/size][/backcolor][/color][/align]>>> future = producer.send('foobar', b'another_message')
>>> result = future.get(timeout=60)
[Shell] 纯文本查看 复制代码 >>> # Block until all pending messages are at least put on the network
>>> # NOTE: This does not guarantee delivery or success! It is really
>>> # only useful if you configure internal batching using linger_ms
>>> producer.flush()
[Shell] 纯文本查看 复制代码 >>> # Use a key for hashed-partitioning
>>> producer.send('foobar', key=b'foo', value=b'bar')
[Shell] 纯文本查看 复制代码 >>> # Serialize json messages
>>> import json
>>> producer = KafkaProducer(value_serializer=lambda v: json.dumps(v).encode('utf-8'))
>>> producer.send('fizzbuzz', {'foo': 'bar'}) [Shell] 纯文本查看 复制代码 >>> # Serialize string keys
>>> producer = KafkaProducer(key_serializer=str.encode)
>>> producer.send('flipflap', key='ping', value=b'1234')
[Shell] 纯文本查看 复制代码 >>> # Compress messages
[mw_shl_code=shell,true]>>> # Include record headers. The format is list of tuples with string key
>>> # and bytes value.
>>> producer.send('foobar', value=b'c29tZSB2YWx1ZQ==', headers=[('content-encoding', b'base64')])
[Shell] 纯文本查看 复制代码 >>> # Get producer performance metrics
>>> metrics = producer.metrics()
>>> producer = KafkaProducer(compression_type='gzip')
>>> for i in range(1000):
... producer.send('foobar', b'msg %d' % i)[/mw_shl_code]
案例:
[Python] 纯文本查看 复制代码 #!/usr/bin/env python
import threading, logging, time
import multiprocessing
from kafka import KafkaConsumer, KafkaProducer
class Producer(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.stop_event = threading.Event()
def stop(self):
self.stop_event.set()
def run(self):
producer = KafkaProducer(bootstrap_servers='localhost:9092')
while not self.stop_event.is_set():
producer.send('my-topic', b"test")
producer.send('my-topic', b"\xc2Hola, mundo!")
time.sleep(1)
producer.close()
class Consumer(multiprocessing.Process):
def __init__(self):
multiprocessing.Process.__init__(self)
self.stop_event = multiprocessing.Event()
def stop(self):
self.stop_event.set()
def run(self):
consumer = KafkaConsumer(bootstrap_servers='localhost:9092',
auto_offset_reset='earliest',
consumer_timeout_ms=1000)
consumer.subscribe(['my-topic'])
while not self.stop_event.is_set():
for message in consumer:
print(message)
if self.stop_event.is_set():
break
consumer.close()
def main():
tasks = [
Producer(),
Consumer()
]
for t in tasks:
t.start()
time.sleep(10)
for task in tasks:
task.stop()
for task in tasks:
task.join()
if __name__ == "__main__":
logging.basicConfig(
format='%(asctime)s.%(msecs)s:%(name)s:%(thread)d:%(levelname)s:%(process)d:%(message)s',
level=logging.INFO
)
main()
|