木鸟杂记

大规模数据系统

Hosting a Hexo Static Blog With Vercel

My blog was originally hosted on GitHub Pages, but it seems Baidu’s crawler was too aggressive and got banned by GitHub. According to marketmechian data, in mainland China’s search engine market, Baidu still holds half the market share:

  • Baidu: 67.09%
  • Sogou: 18.75%
  • Shenma: 6.84%
  • Google: 2.64%
  • bing: 2.6%
  • Other: 2.08%

As a Chinese blog, I still hope to be seen by more domestic users, so I’ve been looking for a way to let Baidu’s crawler automatically index my blog. While browsing blogs, I came across someone recommending zeit.co as a hosting platform. After trying it out, I found it to be an excellent static site hosting + CI Serverless Function platform, and I’m recommending it here.

Author: 木鸟杂记 (Muniao Notes) https://www.qtmuniao.com/2020/03/15/hexo-to-zeit-co/, please indicate the source when reposting

Several Methods

There are many methods online to make Baidu’s crawler index blog pages. The main ones are:

  1. CDN, using cloud service providers to create multiple mirrors of the blog.
  2. Switch hosting platforms, such as domestic code hosting platforms.
  3. Self-host using a VPS.

CDN is relatively expensive, I don’t want to switch hosting platforms, and VPS access speed from mainland China isn’t great. So this matter was put on hold.

zeit.co (Vercel)

While searching for information one day, I noticed a blogger [1] mentioning zeit.co, a cloud platform for hosting static web pages and stateless functions. After trying it out, I found it very powerful. It can integrate with GitHub as a GitHub App, taking over repo CI for deep customization, with fast deployment and automatic scaling.

The key point is that the basic features are currently free, which is more than enough for hosting a blog. What’s even more surprising is that it also offers Smart CDN and Smart DNS.

zeit-price.pngzeit-price.png

Migration

After understanding zeit’s advantages, I started working on migrating the blog’s hosting method. The main steps are as follows:

  1. Static code generation modification
  2. Import code repository into zeit
  3. Modify DNS pointing

Static Code Generation

Not everyone needs this step; I only need it because I didn’t fully understand hexo’s workflow before and had caused my own problems.

Previously, I only used one repo to manage all blog code: using the master branch as the hexo development branch and the gh-pages branch as the static code branch, then pointing the repo’s GitHub Pages to the gh-pages branch to publish the blog.

Since zeit can only perform CI on a repo’s default branch (i.e., master), there are at least two ways to host a hexo blog using zeit:

  1. Host Hexo development code. Use zeit to take over the hexo repository code and configure CI for it: every time code is pushed to master, trigger CI to compile and generate static web page code to the public folder, then publish the public folder.
  2. Host static web page code. Have hexo generate static web page code to the master branch of a new repo each time, then import that repo into zeit for hosting.

When I tried the first method, I found that hexo build kept failing. Later I discovered that because I had deeply customized NexT and modified some of NexT theme’s code, but hadn’t committed it to the repository. Because my local NexT theme code was still linked to the official NexT repo via git to pull the latest features in time, I hadn’t included it in the hexo repository. There should be a solution if I dug deeper into this, but I was lazy, so I chose the simpler second hosting method:

Modify the deployment method in the configuration file hexo/_config.yml of the hexo codebase as follows:

1
2
3
4
5
6
78 # Deployment
79 ## Docs: https://hexo.io/docs/deployment.html
80 deploy:
81 type: git
82 repo: https://github.com/you-git-name/blog-publish.git
83 branch: master

Every time the command hexo d -g is executed, the generated static web page code will be pushed to the repo blog-publish:

zeit-repo-publish.pngzeit-repo-publish.png

There are also additional benefits. Previously, I foolishly exposed various configurations (including some of my website’s secrets) in a public git repository (because private repositories can’t use GitHub Pages). With the second deployment method, I set both my development repo and static code repo to private. In addition, zeit will obtain all read-write permissions for the imported repo, which is a bit concerning. Therefore, using an additional static code repo is quite appropriate here.

Import Code Repo into zeit

Zeit’s web interface is very clean and fairly easy to use.

Sign up. Open the login page: https://zeit.co/login, and log in with your GitHub account.

Import. On the main page, click “Import Project”, then “Import Project From Git Repo”, and select GitHub. Perform authorization, noting that you only need to import the static code repository blog-publish.

Deploy. During the import process, just select the “other” template. The default CI command doesn’t need to be changed, and the Root Directory doesn’t need to be changed either. After the import is complete, it will automatically deploy for you.

Finally, you can see the URL generated by zeit in the deployment card:

zeit-deploy.pngzeit-deploy.png

Modify DNS Pointing

If you have your own domain, you need to add a CNAME record in the DNS resolution of your domain service provider (Alibaba Cloud, GoDaddy, etc.):

1
www  CNAME	默认	blog-publish.now.sh.	0	600

Baidu Crawling

Finally, on Baidu’s Webmaster Tools page, under Data Monitoring > Crawl Diagnosis, perform a test.

The frustrating thing is that I had previously configured a dedicated route for Baidu’s crawler in my domain resolution, directing it to my VPS (which ultimately didn’t work, otherwise this article wouldn’t exist). After removing that record, Baidu’s crawler doesn’t update the IP corresponding to my domain in a timely manner. After clicking the error report, it says it will update in a few minutes, but several hours passed and it still hadn’t updated. But that’s another story.

zeit-baidu.pngzeit-baidu.png

References

[1] Solving the Problem of Baidu Crawler Being Unable to Crawl GitHub Pages Personal Blog: https://zpjiang.me/2020/01/15/let-baidu-index-github-page/

[2] zeit.co Help Documentation: https://zeit.co/docs


我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg