I decided to check which pages of my blog are in Supplemental Results and try to figure out why so I could change the pages and (hopefully) get them out.
It was quite easy to identify why many were (in particular for my directory) … but a few took a little more effort to figure out. In the end, I found there was duplicate content due to exact copies of some pages on another site (pr08.totalunlock.co.uk). Specifically, a proxy site which also logs links publicly for every page visited by people who use it and those pages are being indexed in search engines (for what purpose the site owner set it up this way I do not know).
Unfortunately, blocking (using a 403) the site is not so easy due to the fact its IP is dynamic and nothing particular unique is sent by the proxy script when making a request (the visitors user agent is used)…
So for now I just need to check daily if the IP has changed and hope Googlebot will pick up the 403. The next step is to contact the admin and hoping he’ll stop the indexing from happening and hopefully those pages get de-indexed…. although I still have not been able to find where Google is picking up the links from.
Related posts: