Auto-Solve Google CAPTCHA
- General
Auto-Solve Google CAPTCHA
Here we will be using one of the famous web-automation tool :
SELENIUM along with PYTHON………………..
Here we will be solving Google V3 Captcha which is the current and latest version of Google Image Captcha’s…
Workflow :
1) We see this kind of box which needs to be checked to move forward………..
2)When we click on the small box we get an interface like :
3) Now the images will be all different every time and the object needs to be selected will also differ , so we can’t automate this , via this route . So now the hack is we will click on HEADPHONES icon….. (Also known as Get an audio challenge…………………..)
4) As soon we click on this we will get an interface like :
So basically its a audio challenge in which we have to listen what the challenge is saying and then have to type it out in the box .
So for this we will use audio to text converter which will listen to this audio and convert it into text and that text will be consumed by us………………………………………
FOR SPEECH TO TEXT WE WILL BE USING : https://speech-to-text-demo.ng.bluemix.net/
CODING PART :
Libraries we will be using :
1 2 3 4 5 6 |
from selenium import webdriver from selenium.webdriver.common.keys import Keys from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.common.by import By import os, sys import time,requests |
1 ) Now let’s visit a demo website which has GoogleCaptcha one of the demo site is : https://www.google.com/recaptcha/api2/demo
1 2 3 4 5 6 7 |
//Initialising the chrome_driver.... chrome_option = webdriver.ChromeOptions() chrome_option.add_argument('--disable-notifications') chrome_option.add_argument("--mute-audio")driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_option) //Going to the website where Google Captcha is present... driver.get('https://www.google.com/recaptcha/api2/demo') |
2) Now lets search for Google Captcha on that website using its classname which is common on all websites , where as xpath will differ website to website:
1 2 3 4 5 6 7 |
//Searching for Google Captcha googleClass = driver.find_elements(By.CLASS_NAME,'g-recaptcha')[0] time.sleep(2) googleClass.find_element(By.TAG_NAME,'iframe').click() time.sleep(2) allIframesLen = driver.find_elements(By.TAG_NAME,'iframe') time.sleep(1) |
3) Once we get the captcha we will iterate through it to find the audio button and once found will click on it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
//Searching for audio button audioBtnFound = False audioBtnIndex = -1 for index in range(len(allIframesLen)): driver.switch_to.default_content() iframe = driver.find_elements(By.TAG_NAME,'iframe')[index] driver.switch_to.frame(iframe) driver.implicitly_wait(delayTime) try: driver.find_element(By.ID,'recaptcha-audio-button') or driver.find_element(By.ID,'recaptcha-anchor').click() audioBtnFound = True audioBtnIndex = index break except Exception as e: pass |
4) When we are on audio challenge section we will download that audio and then upload it on the speect to text website to get the text and then will submit that text over here .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
//When we find the audio button..... def audioToText(mp3Path): driver.execute_script('''window.open("","_blank");''') driver.switch_to.window(driver.window_handles[1]) driver.get(googleIBMLink) delayTime = 5 root = driver.find_element(By.ID,'root').find_elements(By.CLASS_NAME,'dropzone _container _container_large') btn = driver.find_element(By.XPATH, '//*[@id="root"]/div/input') btn.send_keys(os.getcwd()+'/audioToText.mp3') # Audio to text is processing time.sleep(delayTime) # Audio to text is processing time.sleep(audioToTextDelay) text = driver.find_element(By.XPATH, '//*[@id="root"]/div/div[7]/div/div/div').find_elements(By.TAG_NAME,'span') result = " ".join( [ each.text for each in text ] ) driver.close() driver.switch_to.window(driver.window_handles[0]) return result def saveFile(content,filename): with open(filename, "wb") as handle: for data in content.iter_content(): handle.write(data) if audioBtnFound: try: while True: href = driver.find_element(By.ID,'audio-source').get_attribute('src') response = requests.get(href, stream=True) saveFile(response,filename) response = audioToText(os.getcwd() + '/' + filename) print(response) driver.switch_to.default_content() iframe = driver.find_elements(By.TAG_NAME,'iframe')[audioBtnIndex] driver.switch_to.frame(iframe) inputbtn = driver.find_element(By.ID,'audio-response') inputbtn.send_keys(response) inputbtn.send_keys(Keys.ENTER) time.sleep(2) errorMsg = driver.find_elements(By.CLASS_NAME,'rc-audiochallenge-error-message')[0] if errorMsg.text == "" or errorMsg.value_of_css_property('display') == 'none': print("Success") break except Exception as e: print(e) print('Caught. Need to change proxy now') else: print('Button not found. This should not happen.') |
Full Source Code :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
from selenium import webdriver from selenium.webdriver.common.keys import Keys from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.common.by import By import os, sys import time,requests delayTime = 2 audioToTextDelay = 10 filename = 'audioToText.mp3' websiteUrl = 'https://www.google.com/recaptcha/api2/demo' googleIBMLink = 'https://speech-to-text-demo.ng.bluemix.net/' chrome_option = webdriver.ChromeOptions() chrome_option.add_argument('--disable-notifications') chrome_option.add_argument("--mute-audio") def audioToText(mp3Path): driver.execute_script('''window.open("","_blank");''') driver.switch_to.window(driver.window_handles[1]) driver.get(googleIBMLink) delayTime = 5 root = driver.find_element(By.ID,'root').find_elements(By.CLASS_NAME,'dropzone _container _container_large') btn = driver.find_element(By.XPATH, '//*[@id="root"]/div/input') btn.send_keys(os.getcwd()+'/audioToText.mp3') # Audio to text is processing time.sleep(delayTime) # Audio to text is processing time.sleep(audioToTextDelay) text = driver.find_element(By.XPATH, '//*[@id="root"]/div/div[7]/div/div/div').find_elements(By.TAG_NAME,'span') result = " ".join( [ each.text for each in text ] ) driver.close() driver.switch_to.window(driver.window_handles[0]) return result def saveFile(content,filename): with open(filename, "wb") as handle: for data in content.iter_content(): handle.write(data) driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_option) driver.get(websiteUrl) time.sleep(1) googleClass = driver.find_elements(By.CLASS_NAME,'g-recaptcha')[0] time.sleep(2) googleClass.find_element(By.TAG_NAME,'iframe').click() time.sleep(2) allIframesLen = driver.find_elements(By.TAG_NAME,'iframe') time.sleep(1) audioBtnFound = False audioBtnIndex = -1 for index in range(len(allIframesLen)): driver.switch_to.default_content() iframe = driver.find_elements(By.TAG_NAME,'iframe')[index] driver.switch_to.frame(iframe) driver.implicitly_wait(delayTime) try: driver.find_element(By.ID,'recaptcha-audio-button') or driver.find_element(By.ID,'recaptcha-anchor').click() audioBtnFound = True audioBtnIndex = index break except Exception as e: pass if audioBtnFound: try: while True: href = driver.find_element(By.ID,'audio-source').get_attribute('src') response = requests.get(href, stream=True) saveFile(response,filename) response = audioToText(os.getcwd() + '/' + filename) print(response) driver.switch_to.default_content() iframe = driver.find_elements(By.TAG_NAME,'iframe')[audioBtnIndex] driver.switch_to.frame(iframe) inputbtn = driver.find_element(By.ID,'audio-response') inputbtn.send_keys(response) inputbtn.send_keys(Keys.ENTER) time.sleep(2) errorMsg = driver.find_elements(By.CLASS_NAME,'rc-audiochallenge-error-message')[0] if errorMsg.text == "" or errorMsg.value_of_css_property('display') == 'none': print("Success") break except Exception as e: print(e) print('Caught. Need to change proxy now') else: print('Button not found. This should not happen.') |
Video showing how this script works :
NOTE :
This script fails when we use private proxies which are not white-labelled , moreover using this script on same website with same IP may got you catch and marked down as bot .
If you are using proxies or have to solve captchas again and again then try using 3rd party services which can solve CAPTCHA’s for you at minimal cost.
Related content
Auriga: Leveling Up for Enterprise Growth!
Auriga’s journey began in 2010 crafting products for India’s