上面的知乎帖子里面有一个名词:OCR。OCR技术是爬虫们应对验证码图片的神器,OCR全称Optical Character Recognition(光学字符识别),如果你把ocr当成一个黑匣子,那么如果你往这个黑匣子里面丢图片,他就会给你识别出图片内的字符,并输出文本给你,关于ocr的历史自己可以wiki一下,我先来说说ocr的实现思路:输入图片→中值滤波去噪点、二值化图片、分割等图片预处理→紧缩重排→字库特征匹配→字符
Windows Phone Driver
Selendroid
IOS Driver
Appium[支持iPhone,iPad,Android和FirefoxOS]
(2)无头(无GUI)浏览器–PhantomJS
Synopsis:PhantomJS is a headless Webkit scriptable with a js API.It has fast & native support for various Web standards: DOM handling,CSS selector,JSON,Canvas,and SVG.
To Be Brief:PhantomJS is web browser without a graphical user inferface.
P.s.
①PhantomJS推荐去官网下载编译好的二进制,因为debian系的distros的源里面那个PhantonJS是不完整的——“It seems that it’s not full-function.”,不完整会导致什么问题么?其他没有遇到过,就遇到过下面这个错误:
"selenium.common.exception.WebDriverException:
Message:ErrorUnable to load Atom 'find_elements' from file':
/ghostdriver/third_party/webdriver-atoms/find_elements.js'"
1
2
3
看到开头的 selenium我以为是selenium的问题,google了半天没有结果,回来再详细看看错误信息,最后直接把错误贴上去才找到,真觉得自己Too Young .Too Naive,一开始直接贴上去搜索就好了.- -
It is worth nothing that if your page uses a lot of AJAX on load then WebDriver may not know when it has completely loaded.
These days most of the webapps are using AJAX techniques when a page is loaded to browser,the elements within that page may load at different time intervals.
This makes locating elements difficult,if the element is not present in the DOM,it will raise ElementVisibleException.
Using waits,we can solve this issue.Waiting provides some time intervals between actions performed(mostly locating element or any other operation with the element)