python提取網頁表格資料庫

發布時間: 2021-03-24 01:49:03

❶ 求助用python從資料庫取數據動態生成表格的方法

一、可使用的第三方庫
python中處理excel表格，常用的庫有（讀excel）表、xlwt（寫excel）表、openpyxl（可讀寫excel表）等。xlrd讀數據較大的excel表時效率高於openpyxl，所以我在寫腳本時就採用了xlrd和xlwt這兩個庫。介紹及下載地址為：http://www.python-excel.org/ 這些庫文件都沒有提供修改現有excel表格內容的功能。一般只能將原excel中的內容讀出、做完處理後，再寫入一個新的excel文件。
二、常見問題
使用python處理excel表格時，發現兩個個比較難纏的問題：unicode編碼和excel中記錄的時間。
因為python的默認字元編碼都為unicode，所以列印從excel中讀出的中文或讀取中文名的excel表或sheet時，程序提示錯誤UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)。這是由於在windows中，中文使用了gb2312編碼方式，python將其當作unicode和ascii來解碼都不正確才報出的錯誤。使用VAR.encode('gb2312')即可解決列印中文的問題。（很奇怪，有的時候雖然能列印出結果，但顯示的不是中文，而是一堆編碼。）若要從中文文件名的excel表中讀取數據，可在文件名前加『u』表示將該中文文件名採用unicode編碼。
有excel中，時間和日期都使用浮點數表示。可看到，當『2013年3月20日』所在單元格使用『常規』格式表示後，內容變為『41353』；當其單元格格式改變為日期後，內容又變為了『2013年3月20日』。而使用xlrd讀出excel中的日期和時間後，得到是的一個浮點數。所以當向excel中寫入的日期和時間為一個浮點數也不要緊，只需將表格的表示方式改為日期和時間，即可得到正常的表示方式。excel中，用浮點數1表示1899年12月31日。
三、常用函數
以下主要介紹xlrd、xlwt、datetime中與日期相關的函數。

import xlrd
import xlwt
from datetime

def testXlrd(filename):
book=xlrd.open_workbook(filename)
sh=book.sheet_by_index(0)
print "Worksheet name(s): ",book.sheet_names()[0]
print 'book.nsheets',book.nsheets
print 'sh.name:',sh.name,'sh.nrows:',sh.nrows,'sh.ncols:',sh.ncols
print 'A1:',sh.cell_value(rowx=0,colx=1)
#如果A3的內容為中文
print 'A2:',sh.cell_value(0,2).encode('gb2312')

def testXlwt(filename):
book=xlwt.Workbook()
sheet1=book.add_sheet('hello')
book.add_sheet('word')
sheet1.write(0,0,'hello')
sheet1.write(0,1,'world')
row1 = sheet1.row(1)
row1.write(0,'A2')
row1.write(1,'B2')

sheet1.col(0).width = 10000

sheet2 = book.get_sheet(1)
sheet2.row(0).write(0,'Sheet 2 A1')
sheet2.row(0).write(1,'Sheet 2 B1')
sheet2.flush_row_data()

sheet2.write(1,0,'Sheet 2 A3')
sheet2.col(0).width = 5000
sheet2.col(0).hidden = True

book.save(filename)

if __name__=='__main__':
testXlrd(u'你好。xls')
testXlwt('helloWord.xls')
base=datetime.date(1899,12,31).toordinal()
tmp=datetime.date(2013,07,16).toordinal()
print datetime.date.fromordinal(tmp+base-1).weekday()

❷ 如何用python抓取網頁資料庫

最簡單可以用urllib，python2.x和python3.x的用法不同，以python2.x為例：
import
urllib
html
=
urllib.open(url)
text
=
html.read()
復雜些可以用requests庫，支持各種請求類型，支持cookies，header等
再復雜些的可以用selenium，支持抓取javascript產生的文本

❸ python如何提取網頁表格保存為csv

您好，python保存csv用pandas庫較為簡便，命令格式為：
import pandas as pd
df=pd.read_csv(filename1,encoding='utf-8')
... ...
df.to_csv(filename2,encoding='utf-8')

❹ 求教如何通過python抓取網頁中表格信息

看你抓的是靜態還是動態的了，這里是靜態表格信息的代碼：


importurllib2
importre
importstring

defearse(strline,ch):
left=0
right=strline.find(ch)

whileright!=-1:
strline=strline.replace(ch,'')
right=strline.find(ch)
returnstrline

url=r"http://www.bjsta.com"

resContent=urllib2.urlopen(url).read()

resContent=resContent.decode('gb18030').encode('utf8')

soup=BeautifulSoup(resContent)

printsoup('title')[0].string

tab=soup.findAll('table')

trs=tab[len(tab)-1].findAll('tr')

fortrIterintrs:
tds=trIter.findAll('td')
fortdIterintds:
span=tdIter('span')
foriinrange(len(span)):
ifspan[i].string:
printearse(span[i].string,'').strip(),
else:
pass
print

❺ python如何提取網頁信息

#不用第三方模塊
from urllib import request
import re

url = '' # 你的網址
req = request.Request(url)
with request.urlopen(req,timeout=60) as htm:
htm = htm.read().decode('gbk',errors='ignore')

pat = re.compile(r'二氧化硫<.+?>(\d.*?)<.+?>(\d.*?)<.+?>(\d.*?)<.+?>(\d.*?)<')
data = pat.search(htm)
for i in range(5):
print(data.group(i)) # 第 0 個是整體匹配字元串，1-4才是所要的數字

❻ python 如何讀取資料庫中的數據並顯示在網頁上

首先你得想啊:
讀取資料庫,你需要資料庫模塊(pymysql或pymssql等等)
顯示在網頁上,你需要web模塊(基本的cgi/wsgi或現成的web框架django/tornado)

❼ python如何用urllib抓取網頁中表格的第二及後續頁面

手邊沒現成代碼，就不貼了，告訴你思路：

雖然url里看著地址都一樣，其實是不一樣的，需要F12分析後台代碼；

然後有兩種方式，

就是F12分析代碼後爬取真實地址；
用py模擬敲擊「下一頁」。

具體代碼都不麻煩，度娘很好找。

❽ python如何讀取網頁中的數據

用Beautiful Soup這類解析模塊：

Beautiful Soup 是用Python寫的一個HTML/XML的解析器，它可以很好的處理不規范標記並生成剖析樹(parse tree)；
它提供簡單又常用的導航(navigating)，搜索以及修改剖析樹的操作；
用urllib或者urllib2(推薦)將頁面的html代碼下載後，用beautifulsoup解析該html；

然後用beautifulsoup的查找模塊或者正則匹配將你想獲得的內容找出來，就可以進行相關處理了，例如：


html='<html><head><title>test</title></head><body><p>testbody</p></body></html>'
soup=BeautifulSoup(html)
soup.contents[0].name
#u'html'
soup.comtents[0].contents[0].name
#u'head'
head=soup.comtents[0].contents[0]
head.parent.name
#u'html'
head.next
#u'<title>test</title>

❾ 不用web框架，怎麼通過python獲取網頁表單提交的數據，並傳入資料庫啊，求大佬們幫幫忙

你想自己實現 wsgi 還是使用 wigiref 模塊？你需要了解wsgi 基礎，所有表單數據可以通過 wsgi 的入口函數中的參數 envrion['wsgi.input'] 獲取到

wsgi參考資料：
https://www.python.org/dev/peps/pep-3333/
https://pep-3333-wsgi.readthedocs.io/en/latest/

閱讀全文

熱點內容

塗鴉論文發布：2021-03-31 13:04:48 瀏覽：698

手機資料庫應用發布：2021-03-31 13:04:28 瀏覽：353

版面217 發布：2021-03-31 13:04:18 瀏覽：587

知網不查的資源發布：2021-03-31 13:03:43 瀏覽：713

基金贖回參考發布：2021-03-31 13:02:08 瀏覽：489

懸疑故事範文發布：2021-03-31 13:02:07 瀏覽：87

做簡單的自我介紹範文發布：2021-03-31 13:01:48 瀏覽：537

戰略地圖參考發布：2021-03-31 13:01:09 瀏覽：463

收支模板發布：2021-03-31 13:00:43 瀏覽：17

電氣學術會議發布：2021-03-31 13:00:32 瀏覽：731

python提取網頁表格資料庫

與python提取網頁表格資料庫相關的資訊