簡單高效的短連結生成服務C#實現

蔡大衛發表於2015-06-23

專案中有一處需求，需要把長網址縮為短網址，把結果通過簡訊、微信等渠道推送給客戶。剛開始直接使用網上現成的開放服務，然後在某個週末突然手癢想自己動手實現一個別具特色的長網址（文字）縮短服務。

由於以前做過socket服務，對資料包的封裝排列還有些印象，因此，短網址服務我第一反應是先設計資料的儲存格式，我這裡沒有采用資料庫，而是使用2個檔案來實現：

Url.db儲存使用者提交的長網址文字，Url.idx 儲存資料索引，記錄每次提交資料的位置（Begin）與長度（Length），還有一些附帶資訊（Hits，DateTime）。由於每次新增長網址，對兩個檔案都是進行Append操作，因此即使這兩個檔案體積很大（比如若干GB），也沒有太大的IO壓力。

再看看Url.idx檔案的結構，ID是主鍵，設為Int64型別，轉換為位元組陣列後的長度為8，緊跟的是Begin，該值是把長網址資料續寫到Url.db檔案之前，Url.db檔案的長度，同樣設為Int64型別。長網址的字串長度有限，Int16足夠使用了，Int16.MaxValue==65536，比Url規範定義的4Kb長度還大，Int16轉換為位元組陣列後長度為2位元組。Hits表示短網址的解析次數，設為Int32，位元組長度為4，DateTime 設為Int64，長度8。由於ID不會像資料庫那樣自動遞增，因此需要手工實現。因此在開始寫入Url.idx前，需要預先讀取最後一行（行是虛的，其實就是最後30位元組）中的的ID值，遞增後才開始寫入新的一行。

也就是說每次提交一個長網址，不管資料有多長（最大不能超過65536位元組），Url.idx 檔案都固定增加 30 位元組。

資料結構一旦明確下來，整個網址縮短服務就變得簡單明瞭。例如連續兩次提交長網址，可能得到的短網址為http://域名/1000，與http://域名/1001，結果顯然很醜陋，域名後面的ID全是數字，而且遞增關係明顯，很容易暴力列舉全部的資料。而且10進位制的數字容量有限，一次提交100萬條的長網址，產生的短網址越來越長，失去意義。

因此下面就開始對ID進行改造，改造的目標有2：

1、增加混淆機制，相鄰兩個ID表面上看不出區別。

2、增加容量，一次性提交100萬條長網址，ID的長度不能有明顯變化。

最簡單最直接的混淆機制，就是把10進位制轉換為62進位制（0-9a-zA-Z），由於順序的abcdef…也很容易猜到下一個ID，因此62進位制字元序列隨機排列一次：

/// <summary>
    /// 生成隨機的0-9a-zA-Z字串
    /// </summary>
    /// <returns></returns>
    public static string GenerateKeys()
    {
        string[] Chars = "0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z".Split(',');
        int SeekSeek = unchecked((int)DateTime.Now.Ticks);
        Random SeekRand = new Random(SeekSeek);
        for (int i = 0; i < 100000; i++)
        {
            int r = SeekRand.Next(1, Chars.Length);
            string f = Chars[0];
            Chars[0] = Chars[r - 1];
            Chars[r - 1] = f;
        }
        return string.Join("", Chars);
    }

執行一次上面的方法，得到隨機序列：

string Seq = "s9LFkgy5RovixI1aOf8UhdY3r4DMplQZJXPqebE0WSjBn7wVzmN2Gc6THCAKut";

用這個序列字串替代0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ，具有很強的混淆特性。一個10進位制的數字按上面的序列轉換為62進位制，將變得面目全非，附轉換方法：

/// <summary>
    /// 10進位制轉換為62進位制
    /// </summary>
    /// <param name="id"></param>
    /// <returns></returns>
    private static string Convert(long id)
    {
        if (id < 62)
        {
            return Seq[(int)id].ToString();
        }
        int y = (int)(id % 62);
        long x = (long)(id / 62);

        return Convert(x) + Seq[y];
    }

    /// <summary>
    /// 將62進位制轉為10進位制
    /// </summary>
    /// <param name="Num"></param>
    /// <returns></returns>
    private static long Convert(string Num)
    {
        long v = 0;
        int Len = Num.Length;
        for (int i = Len - 1; i >= 0; i--)
        {
            int t = Seq.IndexOf(Num[i]);
            double s = (Len - i) - 1;
            long m = (long)(Math.Pow(62, s) * t); 
            v += m;
        }
        return v;
    }

例如執行 Convert(123456789) 得到 RYswX，執行 Convert(123456790) 得到 RYswP。

如果通過分析大量的連續數值，還是可以暴力算出上面的Seq序列值，進而猜測到某個ID左右兩邊的數值。下面進一步強化混淆，ID每次遞增的單位不是固定的1，而是一個隨機值，比如1000,1005,1013,1014,1020，毫無規律可言。

private static Int16 GetRnd(Random seekRand)
    {
        Int16 s = (Int16)seekRand.Next(1, 11);
        return s;
    }

即使把62進位制的值逆向計算出10進位制的ID值，也難於猜測到左右兩邊的值，大大增加暴力列舉的難度。難度雖然增加，但是連續產生的2個62進位制值如前面的RyswX與RyswP，僅個位數不同，還是很像，因此我們再進行第三次簡單的混淆，把62進位制字元向左（右）旋轉一定次數（解析時反向旋轉同樣的次數）：

/// <summary>
    /// 混淆id為字串
    /// </summary>
    /// <param name="id"></param>
    /// <returns></returns>
    private static string Mixup(long id)
    {
        string Key = Convert(id);
        int s = 0;
        foreach (char c in Key)
        {
            s += (int)c;
        }
        int Len = Key.Length;
        int x = (s % Len);
        char[] arr = Key.ToCharArray();
        char[] newarr = new char[arr.Length];
        Array.Copy(arr, x, newarr, 0, Len - x);
        Array.Copy(arr, 0, newarr, Len - x, x);
        string NewKey = "";
        foreach (char c in newarr)
        {
            NewKey += c;
        }
        return NewKey;
    }

    /// <summary>
    /// 解開混淆字串
    /// </summary>
    /// <param name="Key"></param>
    /// <returns></returns>
    private static long UnMixup(string Key)
    {
        int s = 0;
        foreach (char c in Key)
        {
            s += (int)c;
        }
        int Len = Key.Length;
        int x = (s % Len);
        x = Len - x;
        char[] arr = Key.ToCharArray();
        char[] newarr = new char[arr.Length];
        Array.Copy(arr, x, newarr, 0, Len - x);
        Array.Copy(arr, 0, newarr, Len - x, x);
        string NewKey = "";
        foreach (char c in newarr)
        {
            NewKey += c;
        }
        return Convert(NewKey);
    }

執行 Mixup(123456789)得到wXRYs，假如隨機遞增值為7，則下一條記錄的ID執行 Mixup(123456796)得到swWRY，肉眼上很難再聯想到這兩個ID值是相鄰的。

以上講述了資料結構與ID的混淆機制，下面講述的是短網址的解析機制。

得到了短網址，如wXRYs，我們可以通過上面提供的UnMixup()方法，逆向計算出ID值，由於ID不是遞增步長為1的數字，因此不能根據ID馬上計算出記錄在索引檔案中的位置（如：ID * 30）。由於ID是按小到大的順序排列，因此在索引檔案中定位ID，非二分查詢法莫屬。

//二分法查詢的核心程式碼片段
FileStream Index = new FileStream(IndexFile, FileMode.OpenOrCreate, FileAccess.ReadWrite);
            long Id =;//解析短網址得到的真實ID
            long Left = 0;
            long Right = (long)(Index.Length / 30) - 1;
            long Middle = -1;            
            while (Left <= Right)
            {
                Middle = (long)(Math.Floor((double)((Right + Left) / 2)));
                if (Middle < 0) break;
                Index.Position = Middle * 30;
                Index.Read(buff, 0, 8);
                long val = BitConverter.ToInt64(buff, 0);
                if (val == Id) break;                
                if (val < Id)
                {
                    Left = Middle + 1;
                }
                else
                {
                    Right = Middle - 1;
                }
            }       

Index.Close();

二分法查詢的核心是不斷移動指標，讀取中間的8位元組，轉換為數字後再與目標ID比較的過程。這是一個非常高速的演算法，如果有接近43億條短網址記錄，查詢某一個ID，最多隻需要移動32次指標（上面的while迴圈32次）就能找到結果，因為2^32=4294967296。

用二分法查詢是因為前面使用了隨機遞增步長，如果遞增步長設為1，則二分法可免，直接從 ID*30 就能一次性精準定位到索引檔案中的位置。

下面是完整的程式碼，封裝了一個ShortenUrl類：

using System;
using System.Linq;
using System.Web;
using System.IO;
using System.Text;

/// <summary>
/// ShortenUrl 的摘要說明
/// </summary>
public class ShortenUrl
{
    const string Seq = "s9LFkgy5RovixI1aOf8UhdY3r4DMplQZJXPqebE0WSjBn7wVzmN2Gc6THCAKut";

    private static string DataFile
    {
        get { return HttpContext.Current.Server.MapPath("/Url.db"); }
    }

    private static string IndexFile
    {
        get { return HttpContext.Current.Server.MapPath("/Url.idx"); }
    }

    /// <summary>
    /// 批量新增網址，按順序返回Key。如果輸入的一組網址中有不合法的元素，則返回陣列的相同位置（下標）的元素將為null。
    /// </summary>
    /// <param name="Url"></param>    
    /// <returns></returns>
    public static string[] AddUrl(string[] Url)
    {
        FileStream Index = new FileStream(IndexFile, FileMode.OpenOrCreate, FileAccess.ReadWrite);
        FileStream Data = new FileStream(DataFile, FileMode.Append, FileAccess.Write);
        Data.Position = Data.Length;
        DateTime Now = DateTime.Now;
        byte[] dt = BitConverter.GetBytes(Now.ToBinary());
        int _Hits = 0;
        byte[] Hits = BitConverter.GetBytes(_Hits);
        string[] ResultKey = new string[Url.Length];
        int seekSeek = unchecked((int)Now.Ticks);
        Random seekRand = new Random(seekSeek);
        string Host = HttpContext.Current.Request.Url.Host.ToLower();        
        byte[] Status = BitConverter.GetBytes(true);
        //index: ID(8) + Begin(8) + Length(2) + Hits(4) + DateTime(8) = 30
        for (int i = 0; i < Url.Length && i<1000; i++)
        {
            if (Url[i].ToLower().Contains(Host) || Url[i].Length ==0 ||  Url[i].Length > 4096) continue;
            long Begin = Data.Position;            
            byte[] UrlData = Encoding.UTF8.GetBytes(Url[i]);            
            Data.Write(UrlData, 0, UrlData.Length);                        
            byte[] buff = new byte[8];
            long Last;
            if (Index.Length >= 30) //讀取上一條記錄的ID
            {
                Index.Position = Index.Length - 30;
                Index.Read(buff, 0, 8);
                Index.Position += 22;
                Last = BitConverter.ToInt64(buff, 0);
            }
            else
            {
                Last = 1000000; //起步ID，如果太小，生成的短網址會太短。
                Index.Position = 0;
            }
            long RandKey = Last + (long)GetRnd(seekRand);
            byte[] BeginData = BitConverter.GetBytes(Begin);
            byte[] LengthData = BitConverter.GetBytes((Int16)(UrlData.Length));
            byte[] RandKeyData = BitConverter.GetBytes(RandKey);

            Index.Write(RandKeyData, 0, 8);
            Index.Write(BeginData, 0, 8);
            Index.Write(LengthData, 0, 2);
            Index.Write(Hits, 0, Hits.Length);            
            Index.Write(dt, 0, dt.Length);            
            ResultKey[i] = Mixup(RandKey);
        }
        Data.Close();
        Index.Close();
        return ResultKey;
    }

    /// <summary>
    /// 按順序批量解析Key，返回一組長網址。
    /// </summary>
    /// <param name="Key"></param>
    /// <returns></returns>
    public static string[] ParseUrl(string[] Key)
    {
        FileStream Index = new FileStream(IndexFile, FileMode.OpenOrCreate, FileAccess.ReadWrite);
        FileStream Data = new FileStream(DataFile, FileMode.Open, FileAccess.Read);        
        byte[] buff = new byte[8];
        long[] Ids = Key.Select(n => UnMixup(n)).ToArray();
        string[] Result = new string[Ids.Length];
        long _Right = (long)(Index.Length / 30) - 1;        
        for (int j = 0; j < Ids.Length; j++)
        {
            long Id = Ids[j];            
            long Left = 0;
            long Right = _Right;
            long Middle = -1;            
            while (Left <= Right)
            {
                Middle = (long)(Math.Floor((double)((Right + Left) / 2)));
                if (Middle < 0) break;
                Index.Position = Middle * 30;
                Index.Read(buff, 0, 8);
                long val = BitConverter.ToInt64(buff, 0);
                if (val == Id) break;                
                if (val < Id)
                {
                    Left = Middle + 1;
                }
                else
                {
                    Right = Middle - 1;
                }
            }            
            string Url = null;
            if (Middle != -1)
            {
                Index.Position = Middle * 30 + 8; //跳過ID           
                Index.Read(buff, 0, buff.Length);
                long Begin = BitConverter.ToInt64(buff, 0);
                Index.Read(buff, 0, buff.Length);
                Int16 Length = BitConverter.ToInt16(buff, 0);
                byte[] UrlTxt = new byte[Length];
                Data.Position = Begin;
                Data.Read(UrlTxt, 0, UrlTxt.Length);
                int Hits = BitConverter.ToInt32(buff, 2);//跳過2位元組的Length
                byte[] NewHits = BitConverter.GetBytes(Hits + 1);//解析次數遞增, 4位元組
                Index.Position -= 6;//指標撤回到Length之後
                Index.Write(NewHits, 0, NewHits.Length);//覆蓋老的Hits
                Url = Encoding.UTF8.GetString(UrlTxt);                       
            }
            Result[j] = Url;
        }        
        Data.Close();
        Index.Close();
        return Result;
    }

    /// <summary>
    /// 混淆id為字串
    /// </summary>
    /// <param name="id"></param>
    /// <returns></returns>
    private static string Mixup(long id)
    {
        string Key = Convert(id);
        int s = 0;
        foreach (char c in Key)
        {
            s += (int)c;
        }
        int Len = Key.Length;
        int x = (s % Len);
        char[] arr = Key.ToCharArray();
        char[] newarr = new char[arr.Length];
        Array.Copy(arr, x, newarr, 0, Len - x);
        Array.Copy(arr, 0, newarr, Len - x, x);
        string NewKey = "";
        foreach (char c in newarr)
        {
            NewKey += c;
        }
        return NewKey;
    }

    /// <summary>
    /// 解開混淆字串
    /// </summary>
    /// <param name="Key"></param>
    /// <returns></returns>
    private static long UnMixup(string Key)
    {
        int s = 0;
        foreach (char c in Key)
        {
            s += (int)c;
        }
        int Len = Key.Length;
        int x = (s % Len);
        x = Len - x;
        char[] arr = Key.ToCharArray();
        char[] newarr = new char[arr.Length];
        Array.Copy(arr, x, newarr, 0, Len - x);
        Array.Copy(arr, 0, newarr, Len - x, x);
        string NewKey = "";        
        foreach (char c in newarr)
        {
            NewKey += c;
        }
        return Convert(NewKey);
    }

    /// <summary>
    /// 10進位制轉換為62進位制
    /// </summary>
    /// <param name="id"></param>
    /// <returns></returns>
    private static string Convert(long id)
    {
        if (id < 62)
        {
            return Seq[(int)id].ToString();
        }
        int y = (int)(id % 62);
        long x = (long)(id / 62);

        return Convert(x) + Seq[y];
    }

    /// <summary>
    /// 將62進位制轉為10進位制
    /// </summary>
    /// <param name="Num"></param>
    /// <returns></returns>
    private static long Convert(string Num)
    {
        long v = 0;
        int Len = Num.Length;
        for (int i = Len - 1; i >= 0; i--)
        {
            int t = Seq.IndexOf(Num[i]);
            double s = (Len - i) - 1;
            long m = (long)(Math.Pow(62, s) * t);
            v += m;
        }
        return v;
    }

    /// <summary>
    /// 生成隨機的0-9a-zA-Z字串
    /// </summary>
    /// <returns></returns>
    public static string GenerateKeys()
    {
        string[] Chars = "0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z".Split(',');
        int SeekSeek = unchecked((int)DateTime.Now.Ticks);
        Random SeekRand = new Random(SeekSeek);
        for (int i = 0; i < 100000; i++)
        {
            int r = SeekRand.Next(1, Chars.Length);
            string f = Chars[0];
            Chars[0] = Chars[r - 1];
            Chars[r - 1] = f;
        }
        return string.Join("", Chars);
    }

    /// <summary>
    /// 返回隨機遞增步長
    /// </summary>
    /// <param name="SeekRand"></param>
    /// <returns></returns>
    private static Int16 GetRnd(Random SeekRand)
    {
        Int16 Step = (Int16)SeekRand.Next(1, 11);
        return Step;
    }
}

本方案的優點：

把10進位制的ID轉換為62進位制的字元，6位數的62進位制字元容量為 62^6約為568億，如果每次隨機遞增值為1~10（取平均值為5），6位字元的容量仍然能容納113.6億條！這個資料已經遠遠大於一般的資料庫承受能力。由於每次提交長網址採用Append方式寫入，因此寫入效能也不會差。在解析短網址時由於採用二分法查詢，僅移動檔案指標與讀取8位元組的快取，效能上依然非常優秀。

缺點：在高併發的情況下，可能會出現檔案開啟失敗等IO異常，如果改用單執行緒的Node.js來實現，或許可以杜絕這種情況。

長 URL 轉短連結的簡單設計與實現
2017-03-20
SEQSVR：Go + MySQL 實現的高效能 ID 生成服務
2018-07-02
VRGoMySql
短連結服務Octopus的實現與原始碼開放
2020-12-27
原始碼
短影片文案提取的簡單實現
2024-03-29
C#實現連結串列
2007-03-29
C#
線上生成短連結的原因及實現工具
2020-06-01
短連結系統的設計與實現
2022-06-16
連結串列-單連結串列實現
2024-06-03
簡單介紹python中的單向連結串列實現
2022-02-10
Python
微信域名連結防封短連結是如何生成的，微信域名防封短連結程式碼實現示例
2020-02-18
單連結串列實現
2017-06-14
短連結演算法實現–加鹽hash
2019-01-19
演算法
實現長連結轉化成短連結(新浪T.CN短連結以及騰訊URL.cn短網址)API介面程式碼分享
2020-02-10
API
c#實現最簡單的socket通訊
2020-10-03
C#
c#實現簡單的俄羅斯方塊
2018-04-05
C#
C# WebSocket的簡單使用【使用Fleck實現】
2024-11-06
C#Web
C#高效能 TCP 服務的多種實現方式
2016-02-17
C#TCP
用單連結串列實現多項式加，減，乘，簡單微分
2020-10-05
PHP實現長連結轉化成新浪短連結API介面程式碼分享
2019-12-03
PHPAPI
資料結構系列之單連結串列實現一個簡單的LRU演算法
2019-04-03
資料結構演算法
C++:用棧實現反轉連結串列，超簡單！
2020-10-12
C++
核心連結的簡單使用
2017-12-27
go 實現單向連結串列
2020-05-26
Go
Python實現單連結串列
2019-01-10
Python
c++實現單連結串列
2015-10-31
C++
socket實現簡單ssh服務
2024-10-14
連結串列與棧的典型應用——簡單計算機的實現
2016-11-05
計算機
單向迴圈連結串列的實現
2024-05-06
C#反射實現簡單的外掛系統
2018-07-11
C#反射
C#實現的簡單的隨機抽號器
2020-12-24
C#隨機
學生管理系統java簡單實現
2017-12-21
Java
簡單實現TCP下的大檔案高效傳輸
2013-06-19
TCP
簡單實現微服務架構的實踐分享
2023-02-24
微服務架構
資料結構——單連結串列的C++實現
2020-11-02
資料結構C++
帶頭結點的單連結串列實現(C++)
2014-08-13
C++
C# 如何實現簡單的Socket通訊(附示例)
2017-11-15
C#
一個用C#實現的簡單http server (轉)
2007-12-07
C#HTTPServer
c#簡單實現提取網頁內容
2009-11-30
C#網頁

簡單高效的短連結生成服務C#實現

相關文章