This Python 3 code will quickly crawl some URLs you have defined in the input.txt
file. It will then return back the output in the output.txt
file. You can use this to quickly identify pages that return some type of status code. You can freely copy this code into your own Python code. It will run like a charm.
Quickly crawl URLs
import requests
# Define the input and output file names
input_file = "input.txt"
output_file = "output.txt"
off="offline"
# Open the input and output files
with open(input_file, "r") as input_file, open(output_file, "a") as output_file:
# Loop through each line (URL) in the input file
for url in input_file:
try:
# Remove leading/trailing whitespaces and newline characters
url = url.strip()
# Send an HTTP GET request to the URL
response = requests.get(url)
# Get the status code and status message
status_code = response.status_code
status_message = response.reason
print(url,status_code)
# Write the URL and status response to the output file in tab-separated format
output_file.write(f"{url}\t{status_code}\t{status_message}\n")
# Print the result to the console
print(f"URL: {url}\tStatus Code: {status_code}\tStatus Message: {status_message}")
except:
output_file.write(f"{url}\t{off}\t{off}\n")
print(url,off)
print("Processing complete. Results saved to 'output.txt'.")
+ There are no comments
Add yours