Y先生:
第一步,使用js代码把页面的url打印出来,代码是:
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
Y先生:
第一步,使用js代码把页面的url打印出来,代码是:
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
Y先生:
第一步,使用js代码把页面的url打印出来,代码是:
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
Y先生:
第一步,使用js代码把页面的url打印出来,代码是:
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
Y先生:
第一步,使用js代码把页面的url打印出来,代码是:
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
第一步,使用js代码把页面的url打印出来,代码是:
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
链接:https://pan.baidu.com/s/16_QpPGWGpUvRwAiveQSO_g
提取码:0000
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
链接:https://pan.baidu.com/s/16_QpPGWGpUvRwAiveQSO_g
提取码:0000
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
链接:https://pan.baidu.com/s/16_QpPGWGpUvRwAiveQSO_g
提取码:0000
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
链接:https://pan.baidu.com/s/16_QpPGWGpUvRwAiveQSO_g
提取码:0000
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
链接:https://pan.baidu.com/s/16_QpPGWGpUvRwAiveQSO_g
提取码:0000
let b = document.getElementsByTagName("tbody")[0].getElementsByTagName("a");
let sum = "";
for (var i = 0; i < b.length; i++) {
if (b[i].innerText.length >= 6) { sum += b[i].href;
sum += "\n" }
}
console.log(sum)
第二步,把打印出的网址复制保存在文件夹中,使用requests库依次访问,获取每个网页对应pdf的url
python代码:
import requests as req
import time
from bs4 import BeautifulSoup
from tqdm import tqdm
all_pdf = []
with open("./pdf_url.txt","r",encoding="utf-8") as f:
web_url = [i.strip() for i in f.readlines()]
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.54'}
for url in tqdm(web_url):
result = req.get(url,headers=header)
bs_obj = BeautifulSoup(result.text, 'lxml')
pdf_url = bs_obj.find_all(class_ ="pdf-link")[0].get("href")
all_pdf.append(pdf_url)
time.sleep(0.5)
print(all_pdf)
这样就可以获取所有的pdf链接,使用requests库继续获取所有的pdf就可以了
python代码:
import os
if not os.path.exists("./pdf_results/"):os.mkdir("pdf_results")
for index, pdf_url in tqdm(enumerate(all_pdf)):
result = req.get(pdf_url, headers=header)
with open(f"./pdf_results/{index}.pdf", "wb") as f:
f.write(result.content)
time.sleep(0.5)
最终结果我也保存在百度云里了,分享给你
链接:https://pan.baidu.com/s/16_QpPGWGpUvRwAiveQSO_g
提取码:0000