[学习交流] 【上海校区】Druid 0.17入门（4）—— 数据查询方式大全

本文介绍Druid查询数据的方式，首先我们保证数据已经成功载入。

Druid查询基于HTTP，Druid提供了查询视图，并对结果进行了格式化。

Druid提供了三种查询方式，SQL，原生JSON，CURL。

一、SQL查询

我们用wiki的数据为例

查询10条最多的页面编辑

SELECT page, COUNT(*) AS EditsFROM wikipediaWHERE TIMESTAMP '2015-09-12 00:00:00' <= "__time" AND "__time" < TIMESTAMP '2015-09-13 00:00:00'GROUP BY pageORDER BY Edits DESCLIMIT 10我们在Query视图中操作

会有提示

选择Smart query limit会自动限制行数

Druid还提供了命令行查询sql 可以运行bin/dsql进行操作Welcome to dsql, the command-line client for Druid SQL.Type "\h" for help.dsql>

提交sql

dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;┌──────────────────────────────────────────────────────────┬───────┐│ page │ Edits │├──────────────────────────────────────────────────────────┼───────┤│ Wikipedia:Vandalismusmeldung │ 33 ││ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 ││ Jeremy Corbyn │ 27 ││ Wikipedia:Administrators' noticeboard/Incidents │ 21 ││ Flavia Pennetta │ 20 ││ Total Drama Presents: The Ridonculous Race │ 18 ││ User talk:Dudeperson176123 │ 18 ││ Wikipédia:Le Bistro/12 septembre 2015 │ 18 ││ Wikipedia:In the news/Candidates │ 17 ││ Wikipedia:Requests for page protection │ 17 │└──────────────────────────────────────────────────────────┴───────┘Retrieved 10 rows in 0.06s.还可以通过Http发送SQLcurl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql

可以得到如下结果

[ { "page": "Wikipedia:Vandalismusmeldung", "Edits": 33 }, { "page": "User:Cyde/List of candidates for speedy deletion/Subpage", "Edits": 28 }, { "page": "Jeremy Corbyn", "Edits": 27 }, { "page": "Wikipedia:Administrators' noticeboard/Incidents", "Edits": 21 }, { "page": "Flavia Pennetta", "Edits": 20 }, { "page": "Total Drama Presents: The Ridonculous Race", "Edits": 18 }, { "page": "User talk:Dudeperson176123", "Edits": 18 }, { "page": "Wikipédia:Le Bistro/12 septembre 2015", "Edits": 18 }, { "page": "Wikipedia:In the news/Candidates", "Edits": 17 }, { "page": "Wikipedia:Requests for page protection", "Edits": 17 }]更多SQL示例

时间查询

SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeletedFROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'GROUP BY 1

分组查询

SELECT channel, page, SUM(added)FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'GROUP BY channel, pageORDER BY SUM(added) DESC

查询原始数据

SELECT user, pageFROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00'LIMIT 5

定时查询

也可以在dsql里操作

dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘Retrieved 1 row in 0.03s.二、原生JSON查询

Druid支持基于Json的查询

{ "queryType" : "topN", "dataSource" : "wikipedia", "intervals" : ["2015-09-12/2015-09-13"], "granularity" : "all", "dimension" : "page", "metric" : "count", "threshold" : 10, "aggregations" : [ { "type" : "count", "name" : "count" } ]}

把json粘贴到json 查询模式窗口

Json查询是通过向router和broker发送请求

curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>

Druid提供了丰富的查询方式

Aggregation查询Timeseries查询{ "queryType": "timeseries", "dataSource": "sample_datasource", "granularity": "day", "descending": "true", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" }, { "type": "or", "fields": [ { "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" }, { "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" } ] } ] }, "aggregations": [ { "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" }, { "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" } ], "postAggregations": [ { "type": "arithmetic", "name": "sample_divide", "fn": "/", "fields": [ { "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" }, { "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" } ] } ], "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]}TopN查询{ "queryType": "topN", "dataSource": "sample_data", "dimension": "sample_dim", "threshold": 5, "metric": "count", "granularity": "all", "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "dim1", "value": "some_value" }, { "type": "selector", "dimension": "dim2", "value": "some_other_val" } ] }, "aggregations": [ { "type": "longSum", "name": "count", "fieldName": "count" }, { "type": "doubleSum", "name": "some_metric", "fieldName": "some_metric" } ], "postAggregations": [ { "type": "arithmetic", "name": "average", "fn": "/", "fields": [ { "type": "fieldAccess", "name": "some_metric", "fieldName": "some_metric" }, { "type": "fieldAccess", "name": "count", "fieldName": "count" } ] } ], "intervals": [ "2013-08-31T00:00:00.000/2013-09-03T00:00:00.000" ]}GroupBy查询{ "queryType": "groupBy", "dataSource": "sample_datasource", "granularity": "day", "dimensions": ["country", "device"], "limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] }, "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "carrier", "value": "AT&T" }, { "type": "or", "fields": [ { "type": "selector", "dimension": "make", "value": "Apple" }, { "type": "selector", "dimension": "make", "value": "Samsung" } ] } ] }, "aggregations": [ { "type": "longSum", "name": "total_usage", "fieldName": "user_count" }, { "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" } ], "postAggregations": [ { "type": "arithmetic", "name": "avg_usage", "fn": "/", "fields": [ { "type": "fieldAccess", "fieldName": "data_transfer" }, { "type": "fieldAccess", "fieldName": "total_usage" } ] } ], "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ], "having": { "type": "greaterThan", "aggregation": "total_usage", "value": 100 }}Metadata查询TimeBoundary 查询{ "queryType" : "timeBoundary", "dataSource": "sample_datasource", "bound" : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set "filter" : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional}SegmentMetadata查询{ "queryType":"segmentMetadata", "dataSource":"sample_datasource", "intervals":["2013-01-01/2014-01-01"]}DatasourceMetadata查询{ "queryType" : "dataSourceMetadata", "dataSource": "sample_datasource"}Search查询{ "queryType": "search", "dataSource": "sample_datasource", "granularity": "day", "searchDimensions": [ "dim1", "dim2" ], "query": { "type": "insensitive_contains", "value": "Ke" }, "sort" : { "type": "lexicographic" }, "intervals": [ "2013-01-01T00:00:00.000/2013-01-03T00:00:00.000" ]}

查询建议

用Timeseries和TopN替代GroupBy

取消查询

DELETE /druid/v2/{queryId}curl -X DELETE "http://host:port/druid/v2/abc123"

查询失败

{ "error" : "Query timeout", "errorMessage" : "Timeout waiting for task.", "errorClass" : "java.util.concurrent.TimeoutException", "host" : "druid1.example.com:8083"}三、CURL

基于Http的查询

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty四、客户端查询

客户端查询是基于json的

具体查看 https://druid.apache.org/libraries.html

比如python查询的pydruid

from pydruid.client import *from pylab import pltquery = PyDruid(druid_url_goes_here, 'druid/v2')ts = query.timeseries( datasource='twitterstream', granularity='day', intervals='2014-02-02/p4w', aggregations={'length': doublesum('tweet_length'), 'count': doublesum('count')}, post_aggregations={'avg_tweet_length': (Field('length') / Field('count'))}, filter=Dimension('first_hashtag') == 'sochi2014')df = query.export_pandas()df['timestamp'] = df['timestamp'].map(lambda x: x.split('T')[0])df.plot(x='timestamp', y='avg_tweet_length', ylim=(80, 140), rot=20, title='Sochi 2014')plt.ylabel('avg tweet length (chars)')plt.show()

梦缠绕的时候 · 梦缠绕的时候

以上内容转载自网络
更多讯息欢迎添加小优：DKA-2018

帐号		自动登录	找回密码
密码			加入黑马

[学习交流] 【上海校区】Druid 0.17入门（4）—— 数据查询方式大全

1 个回复