如何利用 AI 幫助你找工作？(附完整程式碼 — Python)

Jasmine

6 min readAug 26, 2021

投了好幾封履歷卻都沒有收到任何回應嗎？

利用 Machine Learning Model 幫你檢查履歷吧！

首先，比對履歷與職缺描述的相似度，得到一個 Match Score，分數越高代表履歷上的經歷與職缺需求越相似，得到面試的機會越高。

但是，分數太低怎麼辦？

找出職缺描述中的 Keyword、Summarization，找到職缺描述中重要的內容，加入關鍵字到履歷中。

以下分為三個部分，利用 AI 幫助你修改履歷，提高面試機會！

Match Score
Keyword Extraction
Text Summarization

Match Score

職缺需求以 Glassdoor — Data Scientist Job Description 當作範例，履歷則是拿我自己的來做測試。

以 sklearn cosine_similarity 計算相似度

得到 Match Score：58.98，不及格 😅

接下來我們透過加入職缺描述中的關鍵字與重點摘要來調整履歷。

Keyword Extraction

Gensim 是自然語言處理工具，我們利用 Gensim summarize, keywords 的方法幫助我們找出職缺描述的重要訊息，ratio 越高，會找出越多關鍵字。

[ Keywords ] 
data, business, experience, experiences, models, modeling, model, statistical, statistics, development, develop, techniques, regression, tools, insights, analysis, tree, trees, outcomes, job, company

Text Summarization

把職缺描述當作一篇文章，利用 Text Summarization 找出文章重點

[ Extractive Text Summarization ]
We are looking for a Data Scientist who will support our product, sales, leadership and marketing teams with insights gained from analyzing company data. The ideal candidate is adept at using large data sets to find opportunities for product and process optimization and using models to test the effectiveness of different courses of action. Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies.

接下來藉由 Model 找到關於職缺描述的重要訊息來修改履歷！

machine learning -> data modeling 
help company -> Deliver insight to support business strategies, product development and marketing techniques.
+ improve business outcomes 
+ techniques: Regression, Random Forest, Boosting, Trees...

上述的修改後，Match Score 從原本的 58.98 上升到 65.78。

如果想拿履歷比對多個職缺呢？

將上述的內容統整在一起，按照 Match Score 排序就能一目瞭然！

表格內容包含，公司名、Match Score、Keywords、Summarization

祝福大家看完「如何利用 AI 幫助你找工作」後都能順利找到適合自己的工作 😎😎，文章中只取部分程式碼，完整版： GitHub。
如果你也好奇為什麼只需要一兩行 code 就能取得文本摘要，就接著看下去吧！

Gensim summarization 使用 TextRank 演算法，而 TextRank 是從 PageRank 延伸的概念，所以我們先了解一下 PageRank

[ 演算法] PageRank、TextRank

PageRank

PageRank 是 Google 搜尋引擎為網頁排序的演算法之一，計算網頁的相關性與重要性。PageRank 的核心概念為網頁之間的「連結」，以 Garph 的方式表示網頁之間的連結，每個網頁連結就如同論文，引用次數越高，或是引用者的影響力越大，論文的可信度就越高。網頁也是同樣的概念，排名受到連結的數量與品質影響。

TextRank

TextRank 延伸 PageRank 的概念，將每個句子看作是一個網頁，從句子之間的連結與相似度判斷句子的重要程度，尋找重要的句子作為文章摘要，重要的句子代表與它其他的句子相似度高。

[ 補充 ] Text Summarization

Text Summarization 的演算法主要分成兩大類 — Extractive、Abstractive，Gensim 屬於 Extractive，從原文中抽取部分內容作為摘要。而 Abstractive 的方式則是了解原文的語意後自動產生摘要。以下是 T5 Model abstractive summarization 的程式碼 Colab 。這篇先寫到這邊，有興趣的話可以多參考 T5 官方文件。