Bypass reCAPTCHA v2 with Python
Background
Buying gift cards online saves me 3-5% for grocery shopping in Australia. There are 2 big players in the scene: Woolworths and Coles.
Woolworths has 'Woolworths Money App' that updates gift card balance automatically once registered.
vs
Coles has website where you can check gift card balance after entering 16-20 digit card number, PIN, and proof of not being a robot "every time". Yes, you repeat this for each card.
Keeping track of coles gift card balance is non trivial. I have been meaning to make this process automated, but failed to bypass reCAPTCHA repeatedly. It is a reCAPTCHA version 2, something you just need to click to prove your bot-less-ness.
Past tries
I have tried to bypass reCAPTCHA this coles balance checking site with puppeteer (Node.js) and Selenium (Python) without success in the past. Unsure if this was due to not following all the steps mentioned below...
Bypassed successfully
Became aware of new testing tool Playwright that is an alternative to puppeteer/selenium.
So I tried again, and this time it worked, although more often than not it still fails to bypass and asks to select images. I consider this a win for me as I can just manually select images for checking my gift card balances. Tracking cards and typing would be done effortlessly.
Made a simple django app on top of the scraping script.
Below python snippet shows how reCAPTCHA version 2 can be bypassed.
Steps took to get it working:
- headless browser with headless mode off 🥲
It fails when headless is set to True. Sometimes it fails to bypass and need human interaction therefore needing headless=False
again
- change
navigator.webdriver
value tofalse
- scroll to "I'm not a robot" checkbox
To do
More ways to avoid headless browser detection needs to be tried.
Side note
Tried same code with Playwright in Node.js, but kept failing without a single success for some reason.