Quickly crawl some pages and get the status codes back

Estimated read time 2 min read

This Python 3 code will quickly crawl some URLs you have defined in the input.txt file. It will then return back the output in the output.txt file. You can use this to quickly identify pages that return some type of status code. You can freely copy this code into your own Python code. It will run like a charm.

Quickly crawl URLs

import requests

# Define the input and output file names
input_file = "input.txt"
output_file = "output.txt"
off="offline"
# Open the input and output files
with open(input_file, "r") as input_file, open(output_file, "a") as output_file:
    # Loop through each line (URL) in the input file
    for url in input_file:
        try:
            # Remove leading/trailing whitespaces and newline characters
            url = url.strip()

            # Send an HTTP GET request to the URL
            response = requests.get(url)

            # Get the status code and status message
            status_code = response.status_code
            status_message = response.reason
            
            print(url,status_code)
            # Write the URL and status response to the output file in tab-separated format
            output_file.write(f"{url}\t{status_code}\t{status_message}\n")

            # Print the result to the console
            print(f"URL: {url}\tStatus Code: {status_code}\tStatus Message: {status_message}")
        except:
            output_file.write(f"{url}\t{off}\t{off}\n")
            print(url,off)


print("Processing complete. Results saved to 'output.txt'.")

Reza Rafati https://cyberwarzone.com

Reza Rafati, based in the Netherlands, is the founder of Cyberwarzone.com. An industry professional providing insightful commentary on infosec, cybercrime, cyberwar, and threat intelligence, Reza dedicates his work to bolster digital defenses and promote cyber awareness.

You May Also Like

More From Author

+ There are no comments

Add yours