002.08 新聞搜尋 PySimpleGUI + News API

Jason990420發表於2019-12-15

主題: 002.08 新聞搜尋 PySimpleGUI + News API

建檔日期: 2019/12/15

更新日期: None

語言: Python 3.7.2, PySimpleGUI 4.11.0, newsapi 0.1.1

系統: Win10 Ver. 10.0.17763

002.08 新聞搜尋 PySimpleGUI + News API

最近看了一個提供超過30,000新聞來源的包, 為了方便自己搜尋實時新聞, 寫了一個簡單的軟體, 提供查詢一個月以內(免費使用者的限制)的相闗新聞簡單說明, 再進一步到原新聞來源看完整的新聞內容.

1. 軟體內容:

  • 可選擇語言類別, 目前只提供阿拉伯文,中文,荷蘭文,英文,法語,德語,希伯來語,義大利語,北薩米語,挪威語,葡萄牙語,俄語,西班牙語,瑞典語.
  • 可選擇起始日期到結束日期.
  • 在文章標題和正文中搜尋的關鍵字或短語。
  • 這裡支援高階搜尋:
    • 用雙引號(“)括住短語以精確匹配。
    • 必須帶有+符號的單詞或短語。 例如:+比特幣
    • 不能帶有-符號的單詞。 例如:-bitcoin
    • 可以使用AND / OR / NOT關鍵字,
    • 可選地將這些內容用括號分組, 例如:crypto AND (ethereum OR litecoin) NOT bitcoin。
  • 日期:免費使用者只能選擇不超過一個月的日期
  • 速度:網頁資料載入的page_size越大,速度越慢。現在將其設定為100(最大),不要著急,請稍等片刻。
    您可以將其更改為較小的數字,例如20。
  • URL:單擊每個新聞的標題以瀏覽源URL。

2. 主要包PySimpleGUI以及newsapi的簡單說明

  • PySimple部份:

    建立視窗基本如下

    import PySimleGUI as sg
    layout = [[第一行元素(..., key='key1'), ....], [第二行元素(...,key='key2'), ....], ...., [第N行元素(...,key='keyN'), .....]]
    window = sg.Windows('標題', layout=layout, ....其他引數)
    while True:
    event, values = window.read()
    if event == None:
        break
    if event =='key1':
        do something
    if event =='key2':
        do something
    window.close()
    • 元素基本上類似tkinter的部件, 為了便於使用, 只會有一些簡單必要的引數, 所以如果有特殊要求, 那就是另一回事了.
    • 視窗布局以layout來表示, 有些元素還可以再建layout
    • 'Key'用來在事件產生時, 代表元素(tkinter中稱為部件, 主要是避免混淆)
    • 所有事件以window.read()讀取
  • newsapi部份:

    from newsapi import NewsApiClient
    newsapi = NewsApiClient(api_key='1a8f46f807c44af9b261fae6ae659963')
    top_headlines = newsapi.get_top_headlines(q='bitcoin',
                                      sources='bbc-news,the-verge',
                                      category='business',
                                      language='en',
                                      country='us')
    all_articles = newsapi.get_everything(q='bitcoin',
                                      sources='bbc-news,the-verge',
                                      domains='bbc.co.uk,techcrunch.com',
                                      from_param='2017-12-01',
                                      to='2017-12-12',
                                      language='en',
                                      sort_by='relevancy',
                                      page=2)
    sources = newsapi.get_sources()
    • 建立客戶端類 NewsApiClient()
    • 使用唯有的三個方法: get_top_headlines(), get_everything() 以及 newsapi.get_sources
    • get_top_headlines():提供實時的頭條新聞和重要新聞.
    • get_everything(): 搜尋來自30,000多個大型和小型新聞來源和部落格的數百萬篇文章
    • newsapi.get_sources(): 可用於跟蹤可用的釋出者,並且可以將其直接傳遞給使用者。

3. 輸出畫面

002.08 新聞搜尋 PySimpleGUI + News API

4. 程式碼

注意: 程式碼中有一行my_api_key = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 這是授權碼, 可以到newsapi網站上申請取得.

#!/usr/bin/python
'''
Search worldwide news with PySimpleGUI code & news API
Get breaking news headlines, and search for articles from over 30,000 news
sources and blogs with news API. News API is a simple and easy-to-use API
that returns JSON metadata for headlines and articles live all over the web
right now.
'''
import PySimpleGUI as sg
from tkinter import font as FONT
from newsapi import NewsApiClient
from PIL import Image
from io import BytesIO
import requests
import _thread
import webbrowser
import datetime
import dateutil.relativedelta
import base64
import ctypes
import os

class News():
    '''
    News class: Capture news by newsapi and load photo from souce web sites
    '''
    def __init__(self, text):

        self.date           = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
        self.coding         = 'utf-8'
        self.text           = text
        self.stop           = False
        self.raw_data       = self.read()
        self.length         = len(self.raw_data)
        self.data           = self.convert()
        self.width          = [0 for i in range(self.length)]
        self.height         = [0 for i in range(self.length)]
        self.base64         = [0 for i in range(self.length)]
        self.photo          = []
        self.where          = 0

    def convert(self):
        # Convert raw data structure to my data structure
        if self.length == 0: return []
        result = [{} for i in range(self.length)]
        for i in range(self.length):
            for key, value in self.raw_data[i].items():
                if key == 'source': value = self.raw_data[i]['source']['name']
                if value == None: value = ''
                new = [('<b>', ''), ('</b>', ''), ('\n', ' '), ('’',"'"),
                       ('”',"'"), ('“',"'")]
                mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
                value = mrep(value, new)
                value = value
                result[i][key] = value
        return result

    def read(self):
        # Load news from newapi web site
        try:
            newsapi = NewsApiClient(api_key=my_api_key)
            result  = newsapi.get_everything(
                q=self.text,
                language=Language[default],
                page_size=page_size,
                from_param=start, to=stop)
        except:
            sg.popup('Server link failed !')
            return []
        if result['status'] != 'ok':
            sg.popup('Server link failed !')
            return []
        return result['articles']

    def update(self):
        # Update photos by call thread
        if self.length == 0: return
        for i in range(self.length):
            # self.image(i)     # Slow, but safe
            _thread.start_new_thread(self.image, (i,)) # Quick, but bug

    def image(self, i):
        # Draw image on window canvas
        if self.stop:
            return
        if not self.load(i):    # load photo from web site by URL
            ids = draw.DrawText('X',
                (gap*2+int(im_w/2), canv_h-gap*2-int(im_h/2)-i*(gap*3.5+im_h)),
                color='white', font=font)
            return
        offset = i*(gap*3.5+im_h)

        ids = draw.DrawImage(data=self.base64[i],
            location=(gap*2+(im_w-self.width[i])/2,
                canv_h-gap*2.5-(im_h-self.height[i])/2-offset-self.where))

        news.photo.append(ids)
        return

    def load(self, i):
        # load, resize and convert to base64
        url = self.data[i]['urlToImage']
        if url == '': return False
        try:
            response = requests.get(url)
            if response.status_code != requests.codes.ok:
                return
            im  = Image.open(BytesIO(response.content))
        except:
            print('Failed: request/status code/open', url)
            return False
        if im.width==0 or im.height==0:
            return False
        im  = im.convert(mode='RGBA')
        if im.width*ratio >= im.height:
            self.width[i], self.height[i] = im_w, int(im.height*im_w/im.width)
        else:
            self.width[i], self.height[i] = int(im.width*im_h/im.height), im_h
        im  = im.resize((self.width[i], self.height[i]), resample=Image.LANCZOS)
        buffered = BytesIO()
        im.save(buffered, format="PNG")
        self.base64[i] = base64.b64encode(buffered.getvalue())
        return True

def wheel(event):
    # Mouse wheel event handler
    delta = int(event.delta/2)
    limit = -total_length+canv_h
    if delta < 0:
        if news.where+delta <= limit:
            delta = limit - news.where
            news.where = limit
        else:
            news.where += delta
    elif delta > 0:
        if news.where+delta >= 0:
            delta = -news.where
            news.where = 0
        else:
            news.where += delta
    draw.Move(0, -delta)

def split(txt):
    # Split text for space, ASCII string, non-Unicode char into list
    txt = txt.strip()
    if txt is '':
        return []
    result = []
    string = ''
    for i in range(len(txt)):
        if txt[i] in [' ', '\n','\r']:
            if string is not '':
                result.append(string)
                result.append(' ')
            string = ''
        elif txt[i] in ASCII:
            string += txt[i]
        else:
            if string is not '':
                result.append(string)
            result.append(txt[i])
            string = ''
    if string != '':
        result.append(string)
    return result

def wrap(txt, dist, lines_limit):
    # Wrap string by add '\n' into string for pixel width limit
    if txt is '':
        return '', 1
    tmp = split(txt)
    old_string = ''
    string = ''
    result = ''
    length = len(tmp)
    len_1  = length - 1
    lines = 0
    for i in range(length):
        string += tmp[i]
        if s.measure(string) > dist:
            result += old_string + '\n'
            lines += 1
            if tmp[i] is ' ':
                string = old_string = ''
            else:
                string = old_string = tmp[i]
        else:
            old_string = string
        if lines == lines_limit:
            old_string = ''
            break
    if old_string is not '':
        result += old_string
        lines += 1
    return result

def Layout():
    # Window main Layout
    layout = [[sg.Text('Language', font=font, pad=((40,0),0)),
               sg.Combo(values=language, default_value=default, size=(20,1),
                    enable_events=True, key='Combo', readonly=True, font=font),
               sg.CalendarButton(start, size=(12,1), target='date1',
                    key='date1', format=date_fmt, font=font),
               sg.CalendarButton(stop,  size=(12,1), target='date2',
                    key='date2', format=date_fmt, font=font),
               sg.Text('Key Words', font=font, pad=((5,0),0)),
               sg.InputText(size=(50,1), font = font, pad=((10,0),0),
                    do_not_clear=True, focus=True)]]
    return layout

def update_window():
    global draw, total_length
    # Update window when new search
    global s
    s = FONT.Font(family='Segoe', size=16)
    if news.length == 0:
        sg.popup('No news found or server failed')
        return None

    total_length = (3.5*gap+im_h)*news.length+gap

    layout = Layout() + [[sg.Graph(canvas_size=(canv_w, canv_h), key='Graph',
        graph_bottom_left=(0,0), graph_top_right=(win_w, win_h),
        enable_events=True)]]

    window = sg.Window('News Center', layout=layout, finalize=True,
                return_keyboard_events=True)
    draw = window['Graph']

    for i in range(news.length):
        # Each News
        title  = wrap(str(i+1)+'. '+news.data[i]['title'], title_w, title_h)
        # Wrap description by desc_width
        if news.data[i]['description'] is '':
            desc = 'No description...'
        else:
            desc = wrap(news.data[i]['description'], desc_w, desc_h)
        offset = i*(gap*3.5+im_h)
        draw.DrawRectangle((gap, canv_h-gap-offset),
                (canv_w-gap, canv_h-gap*3.5-im_h-offset), line_color='grey',
                line_width=1)
        draw.DrawRectangle((gap*2, canv_h-offset-gap+16),
                (canv_w-gap*2, canv_h-offset-gap-16), line_color='green',
                fill_color='green')
        draw.DrawText(title, (gap*2+12, canv_h-int(gap/2)-offset),
                color='white', font=font, text_location='n'+'w')
        draw.DrawText(desc, (gap*3+im_w, canv_h-gap*2.5-offset),
                color='white', font=font, text_location='n'+'w')

    window['Graph'].Widget.bind('<MouseWheel>', wheel)

    return window

ctypes.windll.user32.SetProcessDPIAware()   # Set unit of GUI to pixels

# Usable option of Language for free user
Language    = {'Arabic':'ar', 'Chinese':'zh', 'Dutch':'nl', 'English':'en',
               'French':'fr', 'German':'de', 'Hebrew':'he', 'Italian':'it',
               'Northern Sami':'se', 'Norwegian':'no', 'Portuguese':'pt',
               'Russian':'ru', 'Spanish':'es', 'Swedish':'sv'}
language    = list(Language.keys())
language.sort()

ASCII       = [chr(i) for i in range(256)]
font        = 'Segoe 16'
pad         = 20
default     = 'English'
date_fmt    = '%Y-%m-%d'
now         = datetime.datetime.now()
stop        = now.strftime(date_fmt)
start       = (now + dateutil.relativedelta.relativedelta(months=-1))
start       = start.strftime(date_fmt)
month       = start
page_size   = 100 # 100 Max, more page_size, more slow
win_w       = 1620
win_h       = 720
im_w        = 326
im_h        = 145
ratio       = im_h/im_w
canv_w      = win_w
canv_h      = win_h
gap         = 25
title_w     = canv_w - 4*gap - 12
title_h     = 1
desc_w      = canv_w - 5*gap - im_w
desc_h      = 5
# You can get your API-Key on https://newsapi.org/register
my_api_key  = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

help    = '''
Keywords or phrases to search for in the article title and body.

Advanced search is supported here:

► Surround phrases with quotes (") for exact match.
► Prepend words or phrases that must appear with a + symbol. Eg: +bitcoin
► Prepend words that must not appear with a - symbol. Eg: -bitcoin
► Alternatively you can use the AND / OR / NOT keywords,
     and optionally group these with parenthesis.
     Eg: crypto AND (ethereum OR litecoin) NOT bitcoin.

Date:  Should be selected not more than one month before now for free user.
Speed: It will be more slower as higer page_size for web data load.
       Now it is set to 100 (Max), not hurry, just wait a moment.
       You can change it to smaller number, like 20.
URL:   Click on title of each news to browse source URL.
'''

sg.change_look_and_feel('DarkBrown2')

layout = Layout() + [[sg.Graph(canvas_size=(canv_w, canv_h), key='Graph',
        graph_bottom_left=(0,0), graph_top_right=(win_w, win_h))]]

window = sg.Window('News Center', layout=layout, finalize=True,
        return_keyboard_events=True)
draw = window['Graph'].DrawText(help, (canv_w/2, canv_h/2),
        color='white', font=font)

while True:

    event, values = window.read()

    # Window Close
    if event == None:
        break

    # Search Starting by Enter key pressed
    if event == '\r':
        if len(values[0])!=0:
            # Update date information, free user limited in 1-month news
            new_start = window['date1'].GetText()
            new_stop  = window['date2'].GetText()
            start = new_start if new_start >= month else start
            stop  = new_stop if new_stop >= month else stop
            if stop < start:
                start, stop = stop, start
            layout1 = []
            news = News(values[0])
            news.stop = True
            window1 = update_window()
            if window1 != None:
                window.close()
                window = window1
                news.stop=False
                news.update()

    if event=='Graph':
        # News link clicked, transfer to web browser
        dist  = (canv_h-values['Graph'][1]-news.where)
        off   = dist % (3.5*gap+im_h) - gap
        index = int(dist / (3.5*gap+im_h))
        if ((-16<=off<=16) and (2*gap<=values['Graph'][0]<=canv_w-2*gap)
                    and (index < news.length)):
            webbrowser.open(news.data[index]['url'])

    if event == 'Combo':
        # Set default value to selection
        default = values['Combo']

window.close()

Jason Yang

相關文章