Thursday, March 9, 2023

Diary #6 Minimalist PHP Proxy

ProxyMe.PHP

Setting:  I want to access sites on a machine that's not connected to the internet, but is connected to a computer running a local webserver. Example: PS4, PS3, or Xbox 360.

Here is the PHP script on GitHub.

The Journey: This was the solution I used to achieve what I needed: nothing complicated, just view a webpage.

Solution: host a PHP page that fetches the webpage from the requested URL and returns it to the requestor.

Problem: Images don't load. Scripts don't execute. CSS isn't loaded. Because asynchronously loaded materials are unavailable through other URLs (can only access our webserver).

Solution: Change all URLs to re-route to our proxy at our webserver. Example: http://cool-pics.com/1.jpg -> http://192.168.1.253/proxyme.php?url=http://cool-pics.com/1.jpg  ,etc... So now when the requesting machine requests the original page 'pics.com' and the page is like <body><img src='http://pics.com/ha.jpg'></body>, then the src of the image will be changed to our webserver with the original src as the query option. Now when the browser tries to load the image, it will ask our webserver, which is reachable and will get the image.

Problem: Images don't appear. Because the MIME type is incorrect.

Solution: Check the file extension in the PHP script and change the MIME type returned for various file extensions. Now when the requesting machine requests the image, our webserver will return metadata that tells the browser what to do with it.

There are more issues I've faced but haven't looked into or been motivated to fix. One foreseeable issue is of course videos. The way the proxy works is by fetching the file from the internet before serving it to the requestor. The ENTIRE file. So for a 100 kB image, no prob, but a 3 GB video?! I believe the saving grace from being a more grave issue is that PHP's timeout should be about 30 seconds by default, and the loading should stop at that time (? unverified), just in case the user unknowingly requests a video with this setup.

Another potential issue: Cloudflare, etc. Because it's not a browser and can't solve challenges that the major browsers are made to solve, even modifying the HTTP user agent wouldn't enable it to work with Cloudflare sites. I don't think there's a quick, convenient way to get around that. A good way to tell if a particular site uses Cloudflare,etc: when you google image search for that site, does the image come up small and blurry, and less than the reported resolution of the image? Cloudflare. I figure Google cached a thumbnail since the original image is unreachable through the usual way images are requested.

Of course, it was intended to just be a one-day personal project, so these limitations are acceptable to me. It accomplished its goal with flying colors.

Proxies are hard.

No comments:

Post a Comment

Coding Challenge #54 C++ int to std::string (no stringstream or to_string())

Gets a string from an integer (ejemplo gratis: 123 -> "123") Wanted to come up with my own function for this like 10 years ago ...