SQL Server擷取字串和處理中文技巧

Wentasy發表於2014-04-26

原文網址 : https://blog.csdn.net/justdb/article/details/24516997

一環境介紹

SQL  Server

PRINT @@VERSION
MicrosoftSQLServer2012-11.0.2100.60(X64)
Feb10201219:39:15
Copyright(c)MicrosoftCorporation
EnterpriseEdition:Core-basedLicensing(64-bit)onWindowsNT6.1(Build7601:ServicePack1)

作業系統
------------------
System Information
------------------
Operating System: Windows 7 Ultimate 64-bit (6.1, Build 7601) Service Pack 1 (7601.win7sp1_gdr.130828-1532)
System Model: Aspire E1-471G
Processor: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz (4 CPUs), ~2.6GHz
Memory: 4096MB RAM

二實現功能

從一大堆有包含中文字元和編號的字串中過濾出編號。

三實現模擬

首先，我們準備測試資料，注意，這裡的資料全部都是模擬資料，無實際含義。語句如下：

CREATE TABLE #temp
(
   name VARCHAR(80)
);

INSERT INTO #temp
VALUES     ('五道口店3059');

INSERT INTO #temp
VALUES     ('五羊邨店3060');

INSERT INTO #temp
VALUES     ('楊家屯店3061');

INSERT INTO #temp
VALUES     ('十里堤店3062');

INSERT INTO #temp
VALUES     ('中關村店3063');

INSERT INTO #temp
VALUES     ('麗秀店3064');

INSERT INTO #temp
VALUES     ('石門店3065');

INSERT INTO #temp
VALUES     ('黃村店3066');

INSERT INTO #temp
VALUES     ('東圃店3067');

INSERT INTO #temp
VALUES     ('天河店3068');

INSERT INTO #temp
VALUES     ('人民路廣場3069');

INSERT INTO #temp
VALUES     ('社群中心3070');

INSERT INTO #temp
VALUES     ('珠海市3071');

INSERT INTO #temp
VALUES     ('麗都3072');

INSERT INTO #temp
VALUES     ('曉月3073');

INSERT INTO #temp
VALUES     ('舊區3074');

INSERT INTO #temp
VALUES     ('新城3075');

INSERT INTO #temp
VALUES     ('水井溝3076');

然後，我們觀察資料，發現這些資料都有規律，編號是數字，佔4個字元。數字前面包含店、場、心、市、都、月、區、城、溝共9個字元。
我們試著採用SQL Server內建的函式Substring、Charindex、Rtrim、Ltrim過濾掉出現次數最多（店）的字串。
語句如下：

SELECT Rtrim(Ltrim(Substring(name, Charindex('店', name) + 1, Len(name)))) AS name
INTO   #t1
FROM   #temp

以下是這幾個函式的使用說明：

Substring

Returns the part of a character expression that starts at the specified position and has the specified length. The position parameter and the length parameter must evaluate to integers.

Syntax

SUBSTRING(character_expression, position, length)

Arguments

character_expression

Is a character expression from which to extract characters.

position

Is an integer that specifies where the substring begins.

length

Is an integer that specifies the length of the substring as number of characters.

Result Types

DT_WSTR

Charindex
Searches an expression for another expression and returns its starting position if found.

Syntax

CHARINDEX ( expressionToFind ,expressionToSearch [ , start_location ] )

Arguments
expressionToFind
Is a character expression that contains the sequence to be found. expressionToFind is limited to 8000 characters.
expressionToSearch
Is a character expression to be searched.
start_location
Is an integer or bigint expression at which the search starts. If start_location is not specified, is a negative number, or is 0, the search starts at the beginning of expressionToSearch.

Return Types
bigint if expressionToSearch is of the varchar(max), nvarchar(max), or varbinary(max) data types; otherwise, int.

Rtrim
Returns a character expression after removing trailing spaces.

RTRIM does not remove white space characters such as the tab or line feed characters. Unicode provides code points for many different types of spaces, but this function recognizes only the Unicode code point 0x0020. When double-byte character set (DBCS) strings are converted to Unicode they may include space characters other than 0x0020 and the function cannot remove such spaces. To remove all kinds of spaces, you can use the Microsoft Visual Basic .NET RTrim method in a script run from the Script component.

Syntax
RTRIM(character expression)

Arguments
character_expression
Is a character expression from which to remove spaces.

Result Types
DT_WSTR

Ltrim
Returns a character expression after removing leading spaces.

LTRIM does not remove white-space characters such as the tab or line feed characters. Unicode provides code points for many different types of spaces, but this function recognizes only the Unicode code point 0x0020. When double-byte character set (DBCS) strings are converted to Unicode they may include space characters other than 0x0020 and the function cannot remove such spaces. To remove all kinds of spaces, you can use the Microsoft Visual Basic .NET LTrim method in a script run from the Script component.

Syntax
LTRIM(character expression)

Arguments
character_expression
Is a character expression from which to remove spaces.

Result Types
DT_WSTR

好了，我們檢視處理完後的結果，可以看到包含店的字串已經全部過濾出編號。

SELECT * FROM #t1


3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
人民路廣場3069
社群中心3070
珠海市3071
麗都3072
曉月3073
舊區3074
新城3075
水井溝3076

接著我們依次處理包含場、心、市、都、月、區、城、溝的字串，語句和處理結果如下：

SELECT *
FROM   #t1
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


人民路廣場3069
社群中心3070
珠海市3071
麗都3072
曉月3073
舊區3074
新城3075
水井溝3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('場', name) + 1, Len(name)))) AS name
INTO   #t2
FROM   #t1


SELECT *
FROM   #t2
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


社群中心3070
珠海市3071
麗都3072
曉月3073
舊區3074
新城3075
水井溝3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('心', name) + 1, Len(name)))) AS name
INTO   #t3
FROM   #t2


SELECT *
FROM   #t3
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


珠海市3071
麗都3072
曉月3073
舊區3074
新城3075
水井溝3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('市', name) + 1, Len(name)))) AS name
INTO   #t4
FROM   #t3


SELECT *
FROM   #t4
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


麗都3072
曉月3073
舊區3074
新城3075
水井溝3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('都', name) + 1, Len(name)))) AS name
INTO   #t5
FROM   #t4


SELECT *
FROM   #t5
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


曉月3073
舊區3074
新城3075
水井溝3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('月', name) + 1, Len(name)))) AS name
INTO   #t6
FROM   #t5


SELECT *
FROM   #t6
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


舊區3074
新城3075
水井溝3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('區', name) + 1, Len(name)))) AS name
INTO   #t7
FROM   #t6


SELECT *
FROM   #t7
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


新城3075
水井溝3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('城', name) + 1, Len(name)))) AS name
INTO   #t8
FROM   #t7


SELECT *
FROM   #t8
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


水井溝3076


SELECT Rtrim(Ltrim(Substring(name, Charindex('溝', name) + 1, Len(name)))) AS name
INTO   #t9
FROM   #t8


SELECT *
FROM   #t9
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


--無記錄

這是最終的處理結果，過濾出編號後，我就可以利用這些編號和資料庫表進行關聯，獲得想要的資料。

SELECT *
INTO   #result
FROM   #t9


SELECT *
FROM   #result 


name
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076


SELECT s.xxx,
       s.xxx
FROM   xx s
       JOIN #result r
         ON s.xxx = r.name
WHERE  s.xxx = 0;

四總結

本文過濾編號實際上核心程式碼就兩個，第一個是利用SQL Server的內建函式過濾出指定編號，語句如下：

SELECT Rtrim(Ltrim(Substring(name, Charindex('店', name) + 1, Len(name)))) AS name
INTO   #t1
FROM   #temp

第二個是判斷是否包含中文，語句如下：

SELECT *
FROM   #t1
WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN

在工作中，發現和總結這些小技巧會讓你的工作事半功倍。

Good Luck!

字串擷取
2024-05-10
字串
sql常用函式詳解（一）——字串擷取
2018-08-17
SQL函式字串
python 資料處理（字串擷取、()\[]\{}資料型別、{}字典資料取值）
2018-08-08
Python字串資料型別
MySQL 字串函式：字串擷取
2018-04-09
MySql字串函式
jQuery字串擷取詳解
2018-12-31
jQuery字串
sql server對於日期的處理
2019-11-18
SQLServer
SQL Server 異常程式碼處理
2020-08-14
SQLServer
Golang 字串分割,替換和擷取 strings.Split
2021-05-12
Golang字串
SQL Server2008程式堵塞處理方法
2019-01-11
SQLServer
SQL server 特殊字元"u0000"處理
2024-08-27
SQLServer字元
php 擷取中英文混合字串
2019-02-16
PHP字串
Linux下的字串擷取詳解
2018-05-30
Linux字串
JavaScript 擷取指定指定區間字串
2019-11-19
JavaScript字串
Shell中的字串擷取介紹
2019-06-27
字串
shell 使用陣列及字串擷取
2019-02-01
陣列字串
C#常用字串擷取
2021-02-27
C#字串
Swift 4.0 字串擷取,拼接，字串富文字顯示
2018-04-03
Swift字串
SQL SERVER實用技巧
2020-04-04
SQLServer
Java 運算子詳解與字串處理技巧
2024-02-11
Java字串
字串擷取 slice，substr，substring 的區別
2020-10-29
字串
06.字元和字串處理
2024-07-09
字元字串
Java Sting類關於split處理空串以及擷取最大數
2018-04-24
Java
在 SQL Server 中使用 Try Catch 處理異常
2022-07-14
SQLServer
SQL Server 2008事件處理系統簡介LS
2022-03-21
SQLServer事件
SQL SERVER 日期格式化、日期和字串轉換
2018-07-04
SQLServer字串
SQL中的常用的字串處理函式大全
2018-12-17
SQL字串函式
MySQL 字串擷取相關函式總結
2018-04-09
MySql字串函式
Javascript之字串擷取函式slice()、substring()、substr()
2018-05-25
JavaScript字串函式
字串處理
2024-09-15
字串
擷取字串
2018-07-06
字串
字串和日期時間的處理
2020-11-30
字串
kubectl技巧之通過jsonpath擷取屬性
2019-06-19
JSON
js 英文中文混擷取相同個數
2020-04-05
JS
Java String類，字串常量池，建立方法,字串的獲取，擷取，轉換，分割。
2020-10-24
Java字串
SQL Server升級和遷移的三個技巧GZ
2022-03-21
SQLServer
JavaScript 擷取指定長度字串區分漢字和英文字元
2019-10-19
JavaScript字串字元
JavaScript 擷取字串
2019-05-02
JavaScript字串
SQL Server 如何合併組內字串
2020-05-27
SQLServer字串
JavaScript字串和時間處理隨筆
2024-03-15
JavaScript字串

SQL Server擷取字串和處理中文技巧

一 環境介紹

二 實現功能

三 實現模擬

四 總結

相關文章

一環境介紹

二實現功能

三實現模擬

四總結