網(wǎng)站首頁 編程語言 正文
前言碎語
skywalking是個(gè)非常不錯(cuò)的apm產(chǎn)品,但是在使用過程中有個(gè)非常蛋疼的問題,在基于es的存儲(chǔ)情況下,es的數(shù)據(jù)一有問題,就會(huì)導(dǎo)致整個(gè)skywalking web ui服務(wù)不可用,然后需要agent端一個(gè)服務(wù)一個(gè)服務(wù)的停用,然后服務(wù)重新部署后好,全部走一遍。這種問題同樣也會(huì)存在skywalking的版本升級(jí)迭代中。而且apm 這種過程數(shù)據(jù)是允許丟棄的,默認(rèn)skywalking中關(guān)于trace的數(shù)據(jù)記錄只保存了90分鐘。故博主準(zhǔn)備將skywalking的部署容器化,一鍵部署升級(jí)。下文是整個(gè)skywalking 容器化部署的過程。
目標(biāo):將skywalking的docker鏡像運(yùn)行在k8s的集群環(huán)境中提供服務(wù)
docker鏡像構(gòu)建
FROM registry.cn-xx.xx.com/keking/jdk:1.8 ADD apache-skywalking-apm-incubating/ /opt/apache-skywalking-apm-incubating/ RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \ && echo 'Asia/Shanghai' >/etc/timezone \ && chmod +x /opt/apache-skywalking-apm-incubating/config/setApplicationEnv.sh \ && chmod +x /opt/apache-skywalking-apm-incubating/webapp/setWebAppEnv.sh \ && chmod +x /opt/apache-skywalking-apm-incubating/bin/startup.sh \ && echo "tail -fn 100 /opt/apache-skywalking-apm-incubating/logs/webapp.log" >> /opt/apache-skywalking-apm-incubating/bin/startup.sh EXPOSE 8080 10800 11800 12800 CMD /opt/apache-skywalking-apm-incubating/config/setApplicationEnv.sh \ && sh /opt/apache-skywalking-apm-incubating/webapp/setWebAppEnv.sh \ && /opt/apache-skywalking-apm-incubating/bin/startup.sh
在編寫Dockerfile時(shí)需要考慮幾個(gè)問題:skywalking中哪些配置需要?jiǎng)討B(tài)配置(運(yùn)行時(shí)設(shè)置)?怎么保證進(jìn)程一直運(yùn)行(skywalking 的startup.sh和tomcat中 的startup.sh類似)?
application.yml
#cluster: # zookeeper: # hostPort: localhost:2181 # sessionTimeout: 100000 naming: jetty: #OS real network IP(binding required), for agent to find collector cluster host: 0.0.0.0 port: 10800 contextPath: / cache: # guava: caffeine: remote: gRPC: # OS real network IP(binding required), for collector nodes communicate with each other in cluster. collectorN --(gRPC) --> collectorM host: #real_host port: 11800 agent_gRPC: gRPC: #os real network ip(binding required), for agent to uplink data(trace/metrics) to collector. agent--(grpc)--> collector host: #real_host port: 11800 # Set these two setting to open ssl #sslCertChainFile: $path #sslPrivateKeyFile: $path # Set your own token to active auth #authentication: xxxxxx agent_jetty: jetty: # OS real network IP(binding required), for agent to uplink data(trace/metrics) to collector through HTTP. agent--(HTTP)--> collector # SkyWalking native Java/.Net/node.js agents don't use this. # Open this for other implementor. host: 0.0.0.0 port: 12800 contextPath: / analysis_register: default: analysis_jvm: default: analysis_segment_parser: default: bufferFilePath: ../buffer/ bufferOffsetMaxFileSize: 10M bufferSegmentMaxFileSize: 500M bufferFileCleanWhenRestart: true ui: jetty: # Stay in `localhost` if UI starts up in default mode. # Change it to OS real network IP(binding required), if deploy collector in different machine. host: 0.0.0.0 port: 12800 contextPath: / storage: elasticsearch: clusterName: #elasticsearch_clusterName clusterTransportSniffer: true clusterNodes: #elasticsearch_clusterNodes indexShardsNumber: 2 indexReplicasNumber: 0 highPerformanceMode: true # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html bulkActions: 2000 # Execute the bulk every 2000 requests bulkSize: 20 # flush the bulk every 20mb flushInterval: 10 # flush the bulk every 10 seconds whatever the number of requests concurrentRequests: 2 # the number of concurrent requests # Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted. traceDataTTL: 2880 # Unit is minute minuteMetricDataTTL: 90 # Unit is minute hourMetricDataTTL: 36 # Unit is hour dayMetricDataTTL: 45 # Unit is day monthMetricDataTTL: 18 # Unit is month #storage: # h2: # url: jdbc:h2:~/memorydb # userName: sa configuration: default: #namespace: xxxxx # alarm threshold applicationApdexThreshold: 2000 serviceErrorRateThreshold: 10.00 serviceAverageResponseTimeThreshold: 2000 instanceErrorRateThreshold: 10.00 instanceAverageResponseTimeThreshold: 2000 applicationErrorRateThreshold: 10.00 applicationAverageResponseTimeThreshold: 2000 # thermodynamic thermodynamicResponseTimeStep: 50 thermodynamicCountOfResponseTimeSteps: 40 # max collection's size of worker cache collection, setting it smaller when collector OutOfMemory crashed. workerCacheMaxSize: 10000 #receiver_zipkin: # default: # host: localhost # port: 9411 # contextPath: /
webapp.yml
server: port: 8080 collector: path: /graphql ribbon: ReadTimeout: 10000 listOfServers: #real_host:10800 security: user: admin: password: #skywalking_password
動(dòng)態(tài)配置:密碼,grpc等需要綁定主機(jī)的ip都需要運(yùn)行時(shí)設(shè)置,這里我們在啟動(dòng)skywalking的startup.sh只之前,先執(zhí)行了兩個(gè)設(shè)置配置的腳本,通過k8s在運(yùn)行時(shí)設(shè)置的環(huán)境變量來替換需要?jiǎng)討B(tài)配置的參數(shù)
setApplicationEnv.sh
#!/usr/bin/env sh sed -i "s/#elasticsearch_clusterNodes/${elasticsearch_clusterNodes}/g" /opt/apache-skywalking-apm-incubating/config/application.yml sed -i "s/#elasticsearch_clusterName/${elasticsearch_clusterName}/g" /opt/apache-skywalking-apm-incubating/config/application.yml sed -i "s/#real_host/${real_host}/g" /opt/apache-skywalking-apm-incubating/config/application.yml
setWebAppEnv.sh
#!/usr/bin/env sh sed -i "s/#skywalking_password/${skywalking_password}/g" /opt/apache-skywalking-apm-incubating/webapp/webapp.yml sed -i "s/#real_host/${real_host}/g" /opt/apache-skywalking-apm-incubating/webapp/webapp.yml
保持進(jìn)程存在:通過在skywalking 啟動(dòng)腳本startup.sh末尾追加"tail -fn 100 /opt/apache-skywalking-apm-incubating/logs/webapp.log",來讓進(jìn)程保持運(yùn)行,并不斷輸出webapp.log的日志
Kubernetes中部署
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: skywalking namespace: uat spec: replicas: 1 selector: matchLabels: app: skywalking template: metadata: labels: app: skywalking spec: imagePullSecrets: - name: registry-pull-secret nodeSelector: apm: skywalking containers: - name: skywalking image: registry.cn-xx.xx.com/keking/kk-skywalking:5.2 imagePullPolicy: Always env: - name: elasticsearch_clusterName value: elasticsearch - name: elasticsearch_clusterNodes value: 172.16.16.129:31300 - name: skywalking_password value: xxx - name: real_host valueFrom: fieldRef: fieldPath: status.podIP resources: limits: cpu: 1000m memory: 4Gi requests: cpu: 700m memory: 2Gi --- apiVersion: v1 kind: Service metadata: name: skywalking namespace: uat labels: app: skywalking spec: selector: app: skywalking ports: - name: web-a port: 8080 targetPort: 8080 nodePort: 31180 - name: web-b port: 10800 targetPort: 10800 nodePort: 31181 - name: web-c port: 11800 targetPort: 11800 nodePort: 31182 - name: web-d port: 12800 targetPort: 12800 nodePort: 31183 type: NodePort
Kubernetes部署腳本中唯一需要注意的就是env中關(guān)于pod ip的獲取,skywalking中有幾個(gè)ip必須綁定容器的真實(shí)ip,這個(gè)地方可以通過環(huán)境變量設(shè)置到容器里面去
文末結(jié)語
整個(gè)skywalking容器化部署從測試到可用大概耗時(shí)1天,其中花了個(gè)多小時(shí)整了下譚兄的skywalking-docker鏡像(https://hub.docker.com/r/wutang/skywalking-docker/),發(fā)現(xiàn)有個(gè)腳本有權(quán)限問題(譚兄反饋已解決,還沒來的及測試),以及有幾個(gè)地方自己不是很好控制,便build了自己的docker鏡像,其中最大的問題還是解決集群中網(wǎng)絡(luò)通訊的問題,一開始我把skywalking中的服務(wù)ip都設(shè)置為0.0.0.0,然后通過集群的nodePort映射出來,這個(gè)時(shí)候的agent通過集群ip+31181是可以訪問到naming服務(wù)的,然后通過naming服務(wù)獲取到的collector gRPC服務(wù)缺變成了0.0.0.0:11800, 這個(gè)地址agent肯定訪問不到collector的,后面通過綁定pod ip的方式解決了這個(gè)問題。
原文鏈接:http://www.kailing.pub/article/index/arcid/221.html
相關(guān)推薦
- 2023-07-25 使用Http請求調(diào)用第三方API
- 2022-05-19 golang?中?channel?的詳細(xì)使用、使用注意事項(xiàng)及死鎖問題解析_Golang
- 2023-02-25 Golang合并yaml文件過程逐步講解_Golang
- 2022-10-12 深入淺出Golang中select的實(shí)現(xiàn)原理_Golang
- 2024-02-29 UNI-APP獲取當(dāng)前位置,出現(xiàn)getLocation:fail [geolocation:7]錯(cuò)誤
- 2022-09-30 ASP.NET?MVC為用戶創(chuàng)建專屬文件夾_實(shí)用技巧
- 2022-12-24 Kotlin?Channel處理多個(gè)數(shù)據(jù)組合的流_Android
- 2022-08-23 React?Native中實(shí)現(xiàn)動(dòng)態(tài)導(dǎo)入的示例代碼_React
- 最近更新
-
- window11 系統(tǒng)安裝 yarn
- 超詳細(xì)win安裝深度學(xué)習(xí)環(huán)境2025年最新版(
- Linux 中運(yùn)行的top命令 怎么退出?
- MySQL 中decimal 的用法? 存儲(chǔ)小
- get 、set 、toString 方法的使
- @Resource和 @Autowired注解
- Java基礎(chǔ)操作-- 運(yùn)算符,流程控制 Flo
- 1. Int 和Integer 的區(qū)別,Jav
- spring @retryable不生效的一種
- Spring Security之認(rèn)證信息的處理
- Spring Security之認(rèn)證過濾器
- Spring Security概述快速入門
- Spring Security之配置體系
- 【SpringBoot】SpringCache
- Spring Security之基于方法配置權(quán)
- redisson分布式鎖中waittime的設(shè)
- maven:解決release錯(cuò)誤:Artif
- restTemplate使用總結(jié)
- Spring Security之安全異常處理
- MybatisPlus優(yōu)雅實(shí)現(xiàn)加密?
- Spring ioc容器與Bean的生命周期。
- 【探索SpringCloud】服務(wù)發(fā)現(xiàn)-Nac
- Spring Security之基于HttpR
- Redis 底層數(shù)據(jù)結(jié)構(gòu)-簡單動(dòng)態(tài)字符串(SD
- arthas操作spring被代理目標(biāo)對(duì)象命令
- Spring中的單例模式應(yīng)用詳解
- 聊聊消息隊(duì)列,發(fā)送消息的4種方式
- bootspring第三方資源配置管理
- GIT同步修改后的遠(yuǎn)程分支