python提取网页表格数据库

发布时间: 2021-03-24 01:49:03

❶ 求助用python从数据库取数据动态生成表格的方法

一、可使用的第三方库
python中处理excel表格，常用的库有（读excel）表、xlwt（写excel）表、openpyxl（可读写excel表）等。xlrd读数据较大的excel表时效率高于openpyxl，所以我在写脚本时就采用了xlrd和xlwt这两个库。介绍及下载地址为：http://www.python-excel.org/ 这些库文件都没有提供修改现有excel表格内容的功能。一般只能将原excel中的内容读出、做完处理后，再写入一个新的excel文件。
二、常见问题
使用python处理excel表格时，发现两个个比较难缠的问题：unicode编码和excel中记录的时间。
因为python的默认字符编码都为unicode，所以打印从excel中读出的中文或读取中文名的excel表或sheet时，程序提示错误UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)。这是由于在windows中，中文使用了gb2312编码方式，python将其当作unicode和ascii来解码都不正确才报出的错误。使用VAR.encode('gb2312')即可解决打印中文的问题。（很奇怪，有的时候虽然能打印出结果，但显示的不是中文，而是一堆编码。）若要从中文文件名的excel表中读取数据，可在文件名前加‘u’表示将该中文文件名采用unicode编码。
有excel中，时间和日期都使用浮点数表示。可看到，当‘2013年3月20日’所在单元格使用‘常规’格式表示后，内容变为‘41353’；当其单元格格式改变为日期后，内容又变为了‘2013年3月20日’。而使用xlrd读出excel中的日期和时间后，得到是的一个浮点数。所以当向excel中写入的日期和时间为一个浮点数也不要紧，只需将表格的表示方式改为日期和时间，即可得到正常的表示方式。excel中，用浮点数1表示1899年12月31日。
三、常用函数
以下主要介绍xlrd、xlwt、datetime中与日期相关的函数。

import xlrd
import xlwt
from datetime

def testXlrd(filename):
book=xlrd.open_workbook(filename)
sh=book.sheet_by_index(0)
print "Worksheet name(s): ",book.sheet_names()[0]
print 'book.nsheets',book.nsheets
print 'sh.name:',sh.name,'sh.nrows:',sh.nrows,'sh.ncols:',sh.ncols
print 'A1:',sh.cell_value(rowx=0,colx=1)
#如果A3的内容为中文
print 'A2:',sh.cell_value(0,2).encode('gb2312')

def testXlwt(filename):
book=xlwt.Workbook()
sheet1=book.add_sheet('hello')
book.add_sheet('word')
sheet1.write(0,0,'hello')
sheet1.write(0,1,'world')
row1 = sheet1.row(1)
row1.write(0,'A2')
row1.write(1,'B2')

sheet1.col(0).width = 10000

sheet2 = book.get_sheet(1)
sheet2.row(0).write(0,'Sheet 2 A1')
sheet2.row(0).write(1,'Sheet 2 B1')
sheet2.flush_row_data()

sheet2.write(1,0,'Sheet 2 A3')
sheet2.col(0).width = 5000
sheet2.col(0).hidden = True

book.save(filename)

if __name__=='__main__':
testXlrd(u'你好。xls')
testXlwt('helloWord.xls')
base=datetime.date(1899,12,31).toordinal()
tmp=datetime.date(2013,07,16).toordinal()
print datetime.date.fromordinal(tmp+base-1).weekday()

❷ 如何用python抓取网页数据库

最简单可以用urllib，python2.x和python3.x的用法不同，以python2.x为例：
import
urllib
html
=
urllib.open(url)
text
=
html.read()
复杂些可以用requests库，支持各种请求类型，支持cookies，header等
再复杂些的可以用selenium，支持抓取javascript产生的文本

❸ python如何提取网页表格保存为csv

您好，python保存csv用pandas库较为简便，命令格式为：
import pandas as pd
df=pd.read_csv(filename1,encoding='utf-8')
... ...
df.to_csv(filename2,encoding='utf-8')

❹ 求教如何通过python抓取网页中表格信息

看你抓的是静态还是动态的了，这里是静态表格信息的代码：


importurllib2
importre
importstring

defearse(strline,ch):
left=0
right=strline.find(ch)

whileright!=-1:
strline=strline.replace(ch,'')
right=strline.find(ch)
returnstrline

url=r"http://www.bjsta.com"

resContent=urllib2.urlopen(url).read()

resContent=resContent.decode('gb18030').encode('utf8')

soup=BeautifulSoup(resContent)

printsoup('title')[0].string

tab=soup.findAll('table')

trs=tab[len(tab)-1].findAll('tr')

fortrIterintrs:
tds=trIter.findAll('td')
fortdIterintds:
span=tdIter('span')
foriinrange(len(span)):
ifspan[i].string:
printearse(span[i].string,'').strip(),
else:
pass
print

❺ python如何提取网页信息

#不用第三方模块
from urllib import request
import re

url = '' # 你的网址
req = request.Request(url)
with request.urlopen(req,timeout=60) as htm:
htm = htm.read().decode('gbk',errors='ignore')

pat = re.compile(r'二氧化硫<.+?>(\d.*?)<.+?>(\d.*?)<.+?>(\d.*?)<.+?>(\d.*?)<')
data = pat.search(htm)
for i in range(5):
print(data.group(i)) # 第 0 个是整体匹配字符串，1-4才是所要的数字

❻ python 如何读取数据库中的数据并显示在网页上

首先你得想啊:
读取数据库,你需要数据库模块(pymysql或pymssql等等)
显示在网页上,你需要web模块(基本的cgi/wsgi或现成的web框架django/tornado)

❼ python如何用urllib抓取网页中表格的第二及后续页面

手边没现成代码，就不贴了，告诉你思路：

虽然url里看着地址都一样，其实是不一样的，需要F12分析后台代码；

然后有两种方式，

就是F12分析代码后爬取真实地址；
用py模拟敲击“下一页”。

具体代码都不麻烦，度娘很好找。

❽ python如何读取网页中的数据

用Beautiful Soup这类解析模块：

Beautiful Soup 是用Python写的一个HTML/XML的解析器，它可以很好的处理不规范标记并生成剖析树(parse tree)；
它提供简单又常用的导航(navigating)，搜索以及修改剖析树的操作；
用urllib或者urllib2(推荐)将页面的html代码下载后，用beautifulsoup解析该html；

然后用beautifulsoup的查找模块或者正则匹配将你想获得的内容找出来，就可以进行相关处理了，例如：


html='<html><head><title>test</title></head><body><p>testbody</p></body></html>'
soup=BeautifulSoup(html)
soup.contents[0].name
#u'html'
soup.comtents[0].contents[0].name
#u'head'
head=soup.comtents[0].contents[0]
head.parent.name
#u'html'
head.next
#u'<title>test</title>

❾ 不用web框架，怎么通过python获取网页表单提交的数据，并传入数据库啊，求大佬们帮帮忙

你想自己实现 wsgi 还是使用 wigiref 模块？你需要了解wsgi 基础，所有表单数据可以通过 wsgi 的入口函数中的参数 envrion['wsgi.input'] 获取到

wsgi参考资料：
https://www.python.org/dev/peps/pep-3333/
https://pep-3333-wsgi.readthedocs.io/en/latest/

阅读全文

热点内容

涂鸦论文发布：2021-03-31 13:04:48 浏览：698

手机数据库应用发布：2021-03-31 13:04:28 浏览：353

版面217 发布：2021-03-31 13:04:18 浏览：587

知网不查的资源发布：2021-03-31 13:03:43 浏览：713

基金赎回参考发布：2021-03-31 13:02:08 浏览：489

悬疑故事范文发布：2021-03-31 13:02:07 浏览：87

做简单的自我介绍范文发布：2021-03-31 13:01:48 浏览：537

战略地图参考发布：2021-03-31 13:01:09 浏览：463

收支模板发布：2021-03-31 13:00:43 浏览：17

电气学术会议发布：2021-03-31 13:00:32 浏览：731

python提取网页表格数据库

与python提取网页表格数据库相关的资讯