go-zero的全鏈路追蹤與超時

hzzyu發表於2020-10-10

目前分散式鏈路追蹤系統基本都是根據谷歌的《Dapper大規模分散式系統的跟蹤系統》這篇論文發展而來,主流的有zipkin,pinpoint,skywalking,cat,jaeger等。

論文連結:https://storage.googleapis.com/pub-tools-public-publication-data/pdf/36356.pdf

go-zero框架全鏈路追蹤思路

一.httpServer實現TracingHandler

func TracingHandler(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		carrier, err := trace.Extract(trace.HttpFormat, r.Header)
		// ErrInvalidCarrier means no trace id was set in http header
		if err != nil && err != trace.ErrInvalidCarrier {
			logx.Error(err)
		}

		ctx, span := trace.StartServerSpan(r.Context(), carrier, sysx.Hostname(), r.RequestURI)
		defer span.Finish()
		r = r.WithContext(ctx)

		next.ServeHTTP(w, r)
	})
}

func StartServerSpan(ctx context.Context, carrier Carrier, serviceName, operationName string) (
	context.Context, tracespec.Trace) {
	span := newServerSpan(carrier, serviceName, operationName)
	return context.WithValue(ctx, tracespec.TracingKey, span), span
}
span的關鍵字程式碼,如果header裡無TraceID,則隨機生成。

func newServerSpan(carrier Carrier, serviceName, operationName string) tracespec.Trace {
	traceId := stringx.TakeWithPriority(func() string {
		if carrier != nil {
			return carrier.Get(traceIdKey)
		}
		return ""
	}, func() string {
		return stringx.RandId()
	})
	spanId := stringx.TakeWithPriority(func() string {
		if carrier != nil {
			return carrier.Get(spanIdKey)
		}
		return ""
	}, func() string {
		return initSpanId
	})

	return &Span{
		ctx: spanContext{
			traceId: traceId,
			spanId:  spanId,
		},
		serviceName:   serviceName,
		operationName: operationName,
		startTime:     timex.Time(),
		flag:          serverFlag,
	}
}

二.rpcClient

往grpc自帶的Interceptor攔截器中注入TracingInterceptor方法。該方法實現根據ctx來生成子span


func (c *client) buildDialOptions(opts ...ClientOption) []grpc.DialOption {
	var clientOptions ClientOptions
	for _, opt := range opts {
		opt(&clientOptions)
	}

	options := []grpc.DialOption{
		grpc.WithInsecure(),
		grpc.WithBlock(),
		WithUnaryClientInterceptors(
			clientinterceptors.TracingInterceptor,
			clientinterceptors.DurationInterceptor,
			clientinterceptors.BreakerInterceptor,
			clientinterceptors.PrometheusInterceptor,
			clientinterceptors.TimeoutInterceptor(clientOptions.Timeout),
		),
	}
	for _, interceptor := range c.interceptors {
		options = append(options, WithUnaryClientInterceptors(interceptor))
	}

	return append(options, clientOptions.DialOptions...)
}
func TracingInterceptor(ctx context.Context, method string, req, reply interface{},
	cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
	ctx, span := trace.StartClientSpan(ctx, cc.Target(), method)
	defer span.Finish()

	var pairs []string
	span.Visit(func(key, val string) bool {
		pairs = append(pairs, key, val)
		return true
	})
	ctx = metadata.AppendToOutgoingContext(ctx, pairs...)

	return invoker(ctx, method, req, reply, cc, opts...)
}

func StartClientSpan(ctx context.Context, serviceName, operationName string) (context.Context, tracespec.Trace) {
	if span, ok := ctx.Value(tracespec.TracingKey).(*Span); ok {
		return span.Fork(ctx, serviceName, operationName)
	}

	return ctx, emptyNoopSpan
}

func (s *Span) Fork(ctx context.Context, serviceName, operationName string) (context.Context, tracespec.Trace) {
	span := &Span{
		ctx: spanContext{
			traceId: s.ctx.traceId,
			spanId:  s.forkSpanId(),
		},
		serviceName:   serviceName,
		operationName: operationName,
		startTime:     timex.Time(),
		flag:          clientFlag,
	}
	return context.WithValue(ctx, tracespec.TracingKey, span), span
}

 

三.grpcServer

func UnaryTracingInterceptor(serviceName string) grpc.UnaryServerInterceptor {
	return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo,
		handler grpc.UnaryHandler) (resp interface{}, err error) {
		md, ok := metadata.FromIncomingContext(ctx)
		if !ok {
			return handler(ctx, req)
		}

		carrier, err := trace.Extract(trace.GrpcFormat, md)
		if err != nil {
			return handler(ctx, req)
		}

		ctx, span := trace.StartServerSpan(ctx, carrier, serviceName, info.FullMethod)
		defer span.Finish()
		return handler(ctx, req)
	}
}

四.最終效果

邏輯呼叫關係 httpServer->grpcClient1->grpcServer1->grpcClient2->grpcServer2

httpServer通過LogHandler中介軟體列印:

 

grpcServer1通過UnaryStatInterceptor列印:

grpcServer2同理列印

 

全鏈路的超時控制。如果再頂端控制10s超時,在grpcServer1耗時3s時,那麼接下來鏈路只能享用剩下的7s。實現思路也是基於Context的上下文傳遞,通過WithDeadLine來控制。

一.httpServer設定TimeoutHandler中介軟體

func TimeoutHandler(duration time.Duration) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		if duration > 0 {
			return http.TimeoutHandler(next, duration, reason)
		} else {
			return next
		}
	}
}
二.grpcClient設定TimeoutInterceptor
func TimeoutInterceptor(timeout time.Duration) grpc.UnaryClientInterceptor {
	if timeout <= 0 {
		timeout = defaultTimeout
	}

	return func(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn,
		invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
		ctx, cancel := contextx.ShrinkDeadline(ctx, timeout)
		defer cancel()
		return invoker(ctx, method, req, reply, cc, opts...)
	}
}

三.grpcServer設定UnaryTimeoutInterceptor

func UnaryTimeoutInterceptor(timeout time.Duration) grpc.UnaryServerInterceptor {
	return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo,
		handler grpc.UnaryHandler) (resp interface{}, err error) {
		ctx, cancel := contextx.ShrinkDeadline(ctx, timeout)
		defer cancel()
		return handler(ctx, req)
	}
}

func ShrinkDeadline(ctx context.Context, timeout time.Duration) (context.Context, func()) {
	if deadline, ok := ctx.Deadline(); ok {
		leftTime := time.Until(deadline)
		if leftTime < timeout {
			timeout = leftTime
		}
	}

	return context.WithDeadline(ctx, time.Now().Add(timeout))
}

 

B站的client實現思路


// Call invokes the named function, waits for it to complete, and returns its error status.
func (c *Client) Call(ctx context.Context, serviceMethod string, args interface{}, reply interface{}) (err error) {
	var (
		ok      bool
		code    string
		rc      *client
		call    *Call
		cancel  func()
		t       trace.Trace
		timeout = time.Duration(c.timeout)
	)
	if rc, ok = c.client.Load().(*client); !ok || rc == errClient {
		xlog.Error("client is errClient (no rpc client) by ping addr(%s) error", c.addr)
		return ErrNoClient
	}
	if t, ok = trace.FromContext(ctx); !ok {
		t = trace.New(serviceMethod)
	}
	t = t.Fork(_family, serviceMethod)
	t.SetTag(trace.String(trace.TagAddress, rc.remoteAddr))
	defer t.Finish(&err)
	// breaker
	brk := c.breaker.Get(serviceMethod)
	if err = brk.Allow(); err != nil {
		code = "breaker"
		stats.Incr(serviceMethod, code)
		return
	}
	defer c.onBreaker(brk, &err)
	// stat
	now := time.Now()
	defer func() {
		stats.Timing(serviceMethod, int64(time.Since(now)/time.Millisecond))
		if code != "" {
			stats.Incr(serviceMethod, code)
		}
	}()
	// timeout: get from conf
	// if context > conf use conf else context
	deliver := true
	if deadline, ok := ctx.Deadline(); ok {
		if ctimeout := time.Until(deadline); ctimeout < timeout {
			timeout = ctimeout
			deliver = false
		}
	}
	if deliver {
		ctx, cancel = context.WithTimeout(ctx, timeout)
		defer cancel()
	}
	color := metadata.String(ctx, metadata.Color)
	remoteIP := metadata.String(ctx, metadata.RemoteIP)
	// call
	call = &Call{
		ServiceMethod: serviceMethod,
		Args:          args,
		Reply:         reply,
		Trace:         t,
		Color:         color,
		RemoteIP:      remoteIP,
		Timeout:       timeout,
	}
	rc.Do(call)
	select {
	case call = <-call.Done:
		err = call.Error
		code = ecode.Cause(err).Error()
	case <-ctx.Done():
		err = ecode.Deadline
		code = "timeout"
	}
	return
}

B站的metadata資料 

const (

	// Network

	RemoteIP   = "remote_ip"
	RemotePort = "remote_port"
	ServerAddr = "server_addr"
	ClientAddr = "client_addr"

	// Router

	Color = "color"

	// Trace

	Trace  = "trace"
	Caller = "caller"

	// Timeout

	Timeout = "timeout"

	// Dispatch

	CPUUsage = "cpu_usage"
	Errors   = "errors"
	Requests = "requests"

	// Mirror

	Mirror = "mirror"

	// Mid
	// 外網賬戶使用者id

	Mid = "mid"

	// Uid
	// 內網manager平臺的使用者id user_id

	Uid = "uid"

	// Username
	// LDAP平臺的username

	Username = "username"

	// Device
	Device = "device"

	// Cluster cluster info key
	Cluster = "cluster"
)

由此可見,B站思路大致也是如此。附上毛劍老師的講解視訊

https://www.bilibili.com/video/BV1At411V7aT?p=5

 

帶來的思考.使用者資訊等也可通過context傳遞。流量染色,限流熔斷等也是類似思路。

 

go-micro的實現思路(interface)

go-micro的trace外掛

Micro通過Wrapper實現了三種trace介面,aswxray,opencensus,opentracing。第一個是亞馬遜AWS的。
opentracing是一個開源的標準。提供對廠商中立的 API,用來嚮應用程式新增追蹤功能並將追蹤資料傳送到分散式的追蹤系統。已經快成功行業的標準了。
opencensus是谷歌開源的資料收集和分散式跟蹤框架。OpenCensus也是實現了opentracing標準。OpenCensus 不僅提供規範,還提供開發語言的實現,和連線協議,而且它不僅只做追蹤,還引入了額外的度量指標。opencensus也支援把資料匯出到別的系統做分析。比如zipkin和Prometheus等


opencensus+zipkin
opentracing+zipkin
opentracing+Jaeger

注:zipkin 是 twitter 開源的分散式跟蹤系統,並且具有UI介面來顯示每個跟蹤請求的狀態。

執行zipkin

docker run -d -p 9411:9411 openzipkin/zipkin
然後瀏覽器訪問host:9411埠,即可看到zipkin的UI介面

1.opencensus+zipkin

我們使用opencensus的trace功能。也是實現了opentracing的標準。具體實現是由opencensus的trace來做的,然後通過zipkin的exporter把trace收集到的資料丟給zipkin。

1.1 需要引入的包

import (
	...
	"go.opencensus.io/trace"
	"go.opencensus.io/exporter/zipkin"
	wrapperTrace "github.com/micro/go-plugins/wrapper/trace/opencensus"
	openzipkin "github.com/openzipkin/zipkin-go"
	zipkinHTTP "github.com/openzipkin/zipkin-go/reporter/http"
	...
)

1.2修改order微服務下的main.go

建立TraceBoot方法

func TraceBoot() {
	apiURL := "http://192.168.0.111:9411/api/v2/spans"
	hostPort,_ := os.Hostname()
	serviceName := "go.micro.srv.order"

	localEndpoint, err := openzipkin.NewEndpoint(serviceName, hostPort)
	if err != nil {
		log.Fatalf("Failed to create the local zipkinEndpoint: %v", err)
	}
	reporter := zipkinHTTP.NewReporter(apiURL)
	ze := zipkin.NewExporter(reporter, localEndpoint)
	trace.RegisterExporter(ze)
	trace.ApplyConfig(trace.Config{DefaultSampler: trace.AlwaysSample()})
	return
}

在func main裡引用

	// boot trace
	TraceBoot()

	// New Service
	service := grpc.NewService(
		micro.Name("go.micro.srv.order"),
		micro.Version("latest"),
		micro.Broker(b),
		micro.WrapHandler(wrapperTrace.NewHandlerWrapper()),
		micro.WrapClient(wrapperTrace.NewClientWrapper()),
	)

1.3修改order微服務下的handler

將原來handler裡的上下文傳進去。因為為了識別是同一個請求,需要把相同的traceId傳過去。這個是在上下文環境裡的。具體操作就是將原來程式碼裡的context.TODO()替換成ctx即可。

 

 

相關文章