Nginx https in aws

Posted on 2019-02-11 | Edited on 2019-03-09 | In linux

install nginx in aws linux

1	sudo amazon-linux-extras install nginx1.12

apply certificate

apply in wanwang.aliyun.com
wait a few minutes, download nginx version certificate package

config nginx

in /etc/nginx/nginx.conf
add server configuration

server {
    listen 443;
    server_name abc.com; // your domain
    ssl on;
    root  /home/ec2-user/www/abc.com; // page file
    index index.html index.htm;      // index
    ssl_certificate     cert/214292799730473.pem; // certificate name
    ssl_certificate_key cert/214292799730473.key; // certificate name
    ssl_session_timeout 5m;
    ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_prefer_server_ciphers on;
    location / {
        index index.html index.htm;
    }
}
server {
    listen 80;
    server_name bjubi.com; // your domain
    rewrite ^(.*)$ https://$host$1 permanent; // turn http to https
}

start nginx

// check config file
sudo /usr/sbin/nginx -t

// reload nginx
sudo /usr/sbin/nginx -s reload

some useful command

// start in manual
sudo /usr/sbin/nginx -c /etc/nginx/nginx.conf
sudo service nginx stop    // stop
sudo service nginx start   // start
sudo service nginx restart // restart

https://www.cnblogs.com/tianhei/p/7726505.html

Regex in repository

Posted on 2019-02-11 | Edited on 2019-03-09 | In regex

URL匹配后缀部分可能会有特殊字符无法匹配

1	(http\|ftp\|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

URL更简洁的匹配，后缀部分匹配宽泛不精细

1	[\w\-_/:%]+(\.[\w\-_]+)+([\u0021-\u007e]*)?

找到前是linux换行，后不是’320500’的光标

1	\n(?!320500)

找到后不是’create’的windows换行。即\r\n(?!create)

1	(?!(\r\ncreate))\r\n

找到后不是’地址行’的换行，清除，清除前后即URL

1
2
3

(?!(\n.*java\.html))\n.*
(?<=(\.html))".*
.*(?<=")

平衡组匹配？？？

1
2

(?'Open'\()[2-9][0-9]{2}(?'-Open'\))\s?[0-9]{3}-[0-9]{4}(?(Open)(?!))|[2-9][0-9]{2}(-|\.)[0-9]{3}\1[0-9]{4}
<div[^>]*>[^<>]*(((?'Open'<div[^>]*>)[^<>]*)+((?'-Open'</div>)[^<>]*)+)*(?(Open)(?!))</div>

格式化单行xml格式报文

1	(?<=>)(?=<(?!\/))

找非词组空格

1	(?<![a-z,A-Z])\|(?![a-z,A-Z])

找到后面是空行的换行符

1	\r\n(?<=^$)

匹配中文字符

1	[\u4e00-\u9fa5]

匹配双字节字符(包括汉字在内)

1	[^\x00-\xff]

匹配空白行

\n\s*\r

匹配Email地址

1	[\w!#$%&'+/=?^_`{\|}~-]+(?:\.[\w!#$%&'+/=?^_`{\|}~-]+)@(?:[\w](?:[\w-][\w])?\.)+[\w](?:[\w-]*[\w])?

匹配网址URL

1	[a-zA-z]+://[^\s]*

匹配国内电话号码

1	\d{3}-\d{8}\|\d{4}-\{7,8}

匹配QQ号

1	[1-9][0-9]{4,}

匹配中国邮政编码

1	[1-9]\d{5}(?!\d)

匹配18位身份证号

1	^(\d{6})(\d{4})(\d{2})(\d{2})(\d{3})([0-9]\|X)$

匹配(年-月-日)格式日期

1	([0-9]{3}[1-9]\|[0-9]{2}[1-9][0-9]{1}\|[0-9]{1}[1-9][0-9]{2}\|[1-9][0-9]{3})-(((0[13578]\|1[02])-(0[1-9]\|[12][0-9]\|3[01]))\|((0[469]\|11)-(0[1-9]\|[12][0-9]\|30))\|(02-(0[1-9]\|[1][0-9]\|2[0-8])))

匹配正整数

1	^[1-9]\d*$

匹配负整数

1	^-[1-9]\d*$

匹配整数

1	^-?[1-9]\d*$

匹配非负整数（正整数 + 0）

1	^[1-9]\d*\|0$

匹配非正整数（负整数 + 0）

1	^-[1-9]\d*\|0$

匹配正浮点数

1	^[1-9]\d\.\d\|0\.\d[1-9]\d$

匹配负浮点数

1	^-[1-9]\d\.\d\|-0\.\d[1-9]\d$

wireless raspberry audio

Posted on 2019-01-26 | Edited on 2020-09-17 | In raspberry

System

https://www.raspberrypi.org/downloads/
Install Raspbian Stretch

Version:November 2018
Release date:2018-11-13
Kernel version:4.14

Origin shift

1
2
3

sudo nano /etc/apt/sources.list
deb http://mirrors.tuna.tsinghua.edu.cn/raspbian/raspbian/ stretch main contrib non-free rpi
deb-src http://mirrors.tuna.tsinghua.edu.cn/raspbian/raspbian/ stretch main contrib non-free rpi

1
2
3

sudo nano /etc/apt/sources.list.d/raspi.list
deb http://mirror.tuna.tsinghua.edu.cn/raspberrypi/ stretch main ui
deb-src http://mirror.tuna.tsinghua.edu.cn/raspberrypi/ stretch main ui

sudo apt-get update

Config wifi & ssh

sudo nano /etc/wpa_supplicant/wpa_supplicant.conf
network={
    ssid="SSID"
    key_mgmt=WPA-PSK
    psk="PASSWD"
}

ifconfig wlan0
sudo ifdown wlan0
sudo ifup wlan0

# or
sudo reboot

open ssh service

1	sudo raspi-config

open ssh in configuration

1	Socket error Event: 32 Error: 10053

if terminal throw exception then check service
sudo sshd -t

key not exist:

1	ssh-keygen -t rsa -b 2048 -f /etc/ssh/ssh_host_rsa_key

config rsa_key in /etc/ssh

permission not allow:

1	sudo chmod 600 /etc/ssh/*

1	service ssh restart

add command to /etc/rc.load before exit 0

1	/etc/init.d/ssh start

Install A2DP

git clone https://github.com/bareinhard/super-simple-raspberry-pi-audio-receiver-install
cd super-simple-raspberry-pi-audio-receiver-install
git checkout stretch-fix
sudo ./install.sh

# install home version
2. Install the Raspberry Pi Audio Receiver Home Installation

# default name
Do you want all the Devices to use the same name? (y/n) : Choose y

# default passwd
Device WiFi Password: Choose Password

# no sound card
0. No Sound Card

Install create_ap

git clone https://github.com/oblique/create_ap.git
cd create_ap
sudo make install
sudo apt-get install util-linux procps hostapd iproute2 iw haveged dnsmasq

add command to /etc/rc.load before exit 0

1	nohup sudo create_ap -n wlan0 raspberry raspberry --no-virl > /dev/null 2>&1 &

here i use AP without Internet sharing.
if network environment is good, raspberry have internet then add eth0 to command.

1	create_ap -m bridge wlan0 eth0 MyAccessPoint MyPassPhrase

enjoy IOS airplay :)

http://shumeipai.nxez.com/2018/12/28/install-a2dp-to-turn-the-raspberry-pi-into-a-bluetooth-speaker.html
https://blog.csdn.net/huayucong/article/details/51376506
https://blog.csdn.net/newtonsm/article/details/78859152
https://github.com/bareinhard/super-simple-raspberry-pi-audio-receiver-install
https://github.com/oblique/create_ap

Vector group (1)

Posted on 2019-01-20 | Edited on 2020-06-16 | In math

已知向量组，证明 B组能由A组线性表示，但A组不能由B组表示。

$A: a1=\begin{pmatrix} 0 \\ 1 \\ 2 \\ 3 \end{pmatrix}, a2=\begin{pmatrix} 3 \\ 0 \\ 1 \\ 2 \end{pmatrix}, a3=\begin{pmatrix} 2 \\ 3 \\ 0 \\ 1 \end{pmatrix}; B: b1=\begin{pmatrix} 2 \\ 1 \\ 1 \\ 2 \end{pmatrix}, b2=\begin{pmatrix} 0 \\ -2 \\ 1 \\ 1 \end{pmatrix}, b3=\begin{pmatrix} 4 \\ 4 \\ 1 \\ 3 \end{pmatrix};$ $R(A,B)=\begin{pmatrix} 1 & 0 & 3 & 1 & -2 & 4 \\ 0 & 1 & -6 & -1 & 5 & -7 \\ 0 & 2 & -8 & -1 & 7 & -9 \\ 0 & 3 & 2 & 2 & 0 & 4 \end{pmatrix} =\begin{pmatrix} 1 & 0 & 3 & 1 & -2 & 4 \\ 0 & 1 & -6 & -1 & 5 & -7 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}=2$ $R(A)=\begin{pmatrix} 1 & 0 & 3 \\ 0 & 1 & -6 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}=2$

R(A, B) = R(A)
B 能由 A 线性表示

$R(B,A)=\begin{pmatrix} 1 & 1 & 1 & 2 & 1 & 0 \\ 0 & -1 & 1 & -1 & 0 & 1 \\ 0 & -3 & 3 & -1 & -1 & 3 \\ 0 & -2 & 2 & -4 & 1 & 2 \end{pmatrix} =\begin{pmatrix} 1 & 1 & 1 & 2 & 1 & 4 \\ 0 & 1 & 1 & 3 & -1 & 3 \\ 0 & 0 & 1 & -2 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \end{pmatrix}=4$ $R(B)=\begin{pmatrix} 1 & 1 & 1 \\ 0 & 1 & -1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}=2$

A 不能由 B 线性表示

已知向量组A、B ，证明A、B等价

$A: a1=\begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix}, a2=\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}; B: b1=\begin{pmatrix} -1 \\ 0 \\ 1 \end{pmatrix}, b2=\begin{pmatrix} 1 \\ 2 \\ 1 \end{pmatrix}, b2=\begin{pmatrix} 3 \\ 2 \\ -1 \end{pmatrix};$

观察可知 R（A) = 2

$R(B)=\begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \end{pmatrix}=2$

R(A) = R(B) = R(A, B) 即A、B等价

问a取什么值时下列向量组线性相关

$A: a1=\begin{pmatrix} a \\ 1 \\ 1 \end{pmatrix}, a2=\begin{pmatrix} 1 \\ a \\ -1 \end{pmatrix}, a3=\begin{pmatrix} 1 \\ -1 \\ a \end{pmatrix};$

最简形：

$A: \begin{pmatrix} 1 & \frac{1}{a} & \frac{1}{a} \\ 0 & a-\frac{1}{a} & -1-\frac{1}{a} \\ 0 & -1-\frac{1}{a} & a-\frac{1}{a} \end{pmatrix}$

向量组线性相关即 R(A)<3，则有：

$a-\frac{1}{a}=1+\frac{1}{a} (a+1)(a-2)=0$

a = -1 或 a = 2

设a1,a2线性无关，a1+b,a2+b线性相关，求b用a1,a2线性表示的表示式

设 b 可由 $xa_1 + ya_2$ 表示

因为a1+b,a2+b线性相关, 所以 $(x + 1)a_1 + ya_2 = z(xa_1 + (y + 1)a_2)$

$(x + 1)a_1 + ya_2 = z(xa_1 + (y + 1)a_2)$

有 $\frac{a+1}{zx} = \frac{y}{y+1}$

设 x = c, 有 y = -(1+c), 可得 $b = ca_1 - (1+c)a_2$

https://katex.org/#demo
https://katex.org/docs/supported.html

Phpstorm debug setting

Posted on 2018-12-23 | Edited on 2020-06-17 | In php

xdebug official websit

windows download dll base on localhost/phpinfo.php:
https://xdebug.org/download.php
linux download:
https://xdebug.org/wizard.php

php.ini edit

add xdebug setting at the end of php.ini

[xdebug]
zend_extension  = "D:/phpStudy/php56n/php_xdebug-2.2.7-5.6-vc11-nts-x86_64.dll"

xdebug.remote_enable = On
xdebug.profiler_enable = On
xdebug.profiler_enable_trigger = On
xdebug.profiler_output_name = cachegrind.out.%t.%p
xdebug.profiler_output_dir ="D:/phpStudy/php56n/tmp"
xdebug.show_local_vars=0

xdebug.var_display_max_children=128
xdebug.var_display_max_data=512
xdebug.var_display_max_depth=5

xdebug.idekey=PhpStorm
xdebug.remote_enable = On
xdebug.remote_host=127.0.0.1
xdebug.remote_port=9000
xdebug.remote_handler=dbgp

checkout xdebug model is available php -m php -i

Browser setting

chrome install xdebug helper
firefox install theeasiestxdebug

IDE key set to the same as property in php.ini

phpstorm setting

php CLI interpreter setting:

set debug listen server ip or domain

set DBGp key/port [same as property in php.ini]

build a local php debug server [skip if a remote server exists]

add a server listener deal with debug session in php

fundamentals

https://www.cnblogs.com/anyeshe/p/5746404.html

https://www.cnblogs.com/dongruiha/p/6739838.html

spring-kafka NoSuchMethodError exception

Posted on 2018-11-24 | Edited on 2018-12-16 | In db


java.lang.NoSuchMethodError: org.springframework.messaging.handler.annotation.support.MessageMethodArgumentResolver: method <init>()V not found
    at org.springframework.kafka.annotation.KafkaListenerAnnotationBeanPostProcessor$KafkaHandlerMethodFactoryAdapter.createDefaultMessageHandlerMethodFactory(KafkaListenerAnnotationBeanPostProcessor.java:639)
    at org.springframework.kafka.annotation.KafkaListenerAnnotationBeanPostProcessor$KafkaHandlerMethodFactoryAdapter.getMessageHandlerMethodFactory(KafkaListenerAnnotationBeanPostProcessor.java:616)
    at org.springframework.kafka.annotation.KafkaListenerAnnotationBeanPostProcessor$KafkaHandlerMethodFactoryAdapter.createInvocableHandlerMethod(KafkaListenerAnnotationBeanPostProcessor.java:611)
    at org.springframework.kafka.config.MethodKafkaListenerEndpoint.configureListenerAdapter(MethodKafkaListenerEndpoint.java:111)
    at org.springframework.kafka.config.MethodKafkaListenerEndpoint.createMessageListener(MethodKafkaListenerEndpoint.java:97)
    at org.springframework.kafka.config.MethodKafkaListenerEndpoint.createMessageListener(MethodKafkaListenerEndpoint.java:40)
    at org.springframework.kafka.config.AbstractKafkaListenerEndpoint.setupMessageListener(AbstractKafkaListenerEndpoint.java:277)
    at org.springframework.kafka.config.AbstractKafkaListenerEndpoint.setupListenerContainer(AbstractKafkaListenerEndpoint.java:262)
    at org.springframework.kafka.config.AbstractKafkaListenerContainerFactory.createListenerContainer(AbstractKafkaListenerContainerFactory.java:188)
    at org.springframework.kafka.config.AbstractKafkaListenerContainerFactory.createListenerContainer(AbstractKafkaListenerContainerFactory.java:46)
    at org.springframework.kafka.config.KafkaListenerEndpointRegistry.createListenerContainer(KafkaListenerEndpointRegistry.java:182)
    at org.springframework.kafka.config.KafkaListenerEndpointRegistry.registerListenerContainer(KafkaListenerEndpointRegistry.java:154)
    at org.springframework.kafka.config.KafkaListenerEndpointRegistry.registerListenerContainer(KafkaListenerEndpointRegistry.java:128)
    at org.springframework.kafka.config.KafkaListenerEndpointRegistrar.registerAllEndpoints(KafkaListenerEndpointRegistrar.java:138)
    at org.springframework.kafka.config.KafkaListenerEndpointRegistrar.afterPropertiesSet(KafkaListenerEndpointRegistrar.java:132)
    at org.springframework.kafka.annotation.KafkaListenerAnnotationBeanPostProcessor.afterSingletonsInstantiated(KafkaListenerAnnotationBeanPostProcessor.java:224)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:796)
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:861)
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:541)
    at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:122)
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:759)
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:369)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:313)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1185)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1174)

exclude spring-message at spring-kafka in pom
add spring-message

<dependency>
  <groupId>org.springframework</groupId>
  <artifactId>spring-messaging</artifactId>
  <version>4.3.9.RELEASE</version>
</dependency>

https://www.oschina.net/question/2417189_2247108
https://blog.csdn.net/wangshuminjava/article/details/80241922

Spring Boot KafkaListener concurrent

Posted on 2018-11-22 | Edited on 2018-12-16 | In db

Cocurrent consume

1
2
3

ConcurrentKafkaListenerContainerFactory<String, String> factory =
      new ConcurrentKafkaListenerContainerFactory<>();
factory.setConcurrency(7);

Increase kafka consumer concurrent thread can increase out put speed.
Be careful,topic partition num may cause bottleneck.
The consumer groups worker should not bigger than partition num, otherwise thread be wasted.

The system’s lowest part define the performence.
If consumer’s post order process is slow, thread may be block in a brief.
In more worse situation, the session can out of time, offset be reset.

Consume in batch

ConcurrentKafkaListenerContainerFactory<String, String> factory =
      new ConcurrentKafkaListenerContainerFactory<>();
factory.setBatchListener(true);
factory.getContainerProperties().setPollTimeout(pollTimeout);

1	consumerConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, maxPollRecords);

the default max.poll.interval.ms=300000, max.poll.records=50 every batch fetch fifty messages.

this may cause warn Auto offset commit failed

Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms,
which typically implies that the poll loop is spending too much time message processing.
You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.

you can reduce max.poll.records, or increse session.timeout.ms
heatbeat.interval.ms must be lower than session.timeout.ms, and is usually set to a 1/3 of the timeout value.

change enable.auto.commit to false, use spring-kafka internal mechanism manage message’s commit.
if enable.auto.commit is true, then if the processing time is lower then the auto.commit.interval.ms
the ack[commit] will wait for the cycle come.

Increase partitions

1	bin/kafka-topics.sh --zookeeper localhost:2181 --alter --partitions 30 --topic demo

Partition can only increase not decrease, in sync increase the consumer groups’ worker number can increase output speed.
If kafka cluster run in three machine, then cluster have 3 brokers, create demo topic with 3 partitions.
For high availability, set every partition have 2 replication-factors, then every broker will have two kafka log files.

If one machine deaded, ther two can still work.
If set replication-factors to 3, then every two machine deaded topic can still work.

brokerA partiton0/partiton1
brokerB partiton1/partiton2
brokerC partiton2/partiton0

brokerA partiton0/partiton1/partiton2
brokerB partiton1/partiton2/partiton0
brokerC partiton2/partiton0/partiton1

modify replication-factor
https://blog.csdn.net/russle/article/details/83421904

optimize consumer
https://docs.spring.io/spring-kafka/reference/html/_reference.html
https://blog.csdn.net/zwgdft/article/details/54633105

Warn:
https://www.jianshu.com/p/4e00dff97f39
https://blog.csdn.net/zwx19921215/article/details/83269445

How to choose the number of topics/partitions in a Kafka cluster?

Posted on 2018-11-12 | Edited on 2018-12-16 | In db

This is a common question asked by many Kafka users.
The goal of this post is to explain a few important determining factors and provide a few simple formulas.

More Partitions Lead to Higher Throughput

The first thing to understand is that a topic partition is the unit of parallelism in Kafka.
On both the producer and the broker side, writes to different partitions can be done fully in parallel.
So expensive operations such as compression can utilize more hardware resources.
On the consumer side, Kafka always gives a single partition’s data to one consumer thread.
Thus, the degree of parallelism in the consumer (within a consumer group) is bounded by the number of partitions being consumed.
Therefore, in general, the more partitions there are in a Kafka cluster, the higher the throughput one can achieve.

A rough formula for picking the number of partitions is based on throughput.
You measure the throughout that you can achieve on a single partition for production (call it p) and consumption (call it c).
Let’s say your target throughput is t.
Then you need to have at least max(t/p, t/c) partitions.
The per-partition throughput that one can achieve on the producer depends on configurations such as the batching size, compression codec, type of acknowledgement, replication factor, etc.
However, in general, one can produce at 10s of MB/sec on just a single partition as shown in this benchmark.
The consumer throughput is often application dependent since it corresponds to how fast the consumer logic can process each message.
So, you really need to measure it.

Although it’s possible to increase the number of partitions over time, one has to be careful if messages are produced with keys.
When publishing a keyed message, Kafka deterministically maps the message to a partition based on the hash of the key.
This provides a guarantee that messages with the same key are always routed to the same partition.
This guarantee can be important for certain applications since messages within a partition are always delivered in order to the consumer.
If the number of partitions changes, such a guarantee may no longer hold.
To avoid this situation, a common practice is to over-partition a bit.
Basically, you determine the number of partitions based on a future target throughput, say for one or two years later.
Initially, you can just have a small Kafka cluster based on your current throughput.
Over time, you can add more brokers to the cluster and proportionally move a subset of the existing partitions to the new brokers (which can be done online).
This way, you can keep up with the throughput growth without breaking the semantics in the application when keys are used.

In addition to throughput, there are a few other factors that are worth considering when choosing the number of partitions.
As you will see, in some cases, having too many partitions may also have negative impact.

More Partitions Requires More Open File Handles

Each partition maps to a directory in the file system in the broker.
Within that log directory, there will be two files (one for the index and another for the actual data) per log segment.
Currently, in Kafka, each broker opens a file handle of both the index and the data file of every log segment.
So, the more partitions, the higher that one needs to configure the open file handle limit in the underlying operating system.
This is mostly just a configuration issue.
We have seen production Kafka clusters running with more than 30 thousand open file handles per broker.

More Partitions May Increase Unavailability

Kafka supports intra-cluster replication, which provides higher availability and durability.
A partition can have multiple replicas, each stored on a different broker.
One of the replicas is designated as the leader and the rest of the replicas are followers.
Internally, Kafka manages all those replicas automatically and makes sure that they are kept in sync.
Both the producer and the consumer requests to a partition are served on the leader replica.
When a broker fails, partitions with a leader on that broker become temporarily unavailable.
Kafka will automatically move the leader of those unavailable partitions to some other replicas to continue serving the client requests.
This process is done by one of the Kafka brokers designated as the controller.
It involves reading and writing some metadata for each affected partition in ZooKeeper.
Currently, operations to ZooKeeper are done serially in the controller.

In the common case when a broker is shut down cleanly, the controller will proactively move the leaders off the shutting down broker one at a time.
The moving of a single leader takes only a few milliseconds.
So, from the clients perspective, there is only a small window of unavailability during a clean broker shutdown.

However, when a broker is shut down uncleanly (e.g., kill -9), the observed unavailability could be proportional to the number of partitions.
Suppose that a broker has a total of 2000 partitions, each with 2 replicas.
Roughly, this broker will be the leader for about 1000 partitions.
When this broker fails uncleanly, all those 1000 partitions become unavailable at exactly the same time.
Suppose that it takes 5 ms to elect a new leader for a single partition.
It will take up to 5 seconds to elect the new leader for all 1000 partitions.
So, for some partitions, their observed unavailability can be 5 seconds plus the time taken to detect the failure.

If one is unlucky, the failed broker may be the controller.
In this case, the process of electing the new leaders won’t start until the controller fails over to a new broker.
The controller failover happens automatically, but requires the new controller to read some metadata for every partition from ZooKeeper during initialization.
For example, if there are 10,000 partitions in the Kafka cluster and initializing the metadata from ZooKeeper takes 2 ms per partition, this can add 20 more seconds to the unavailability window.

In general, unclean failures are rare.
However, if one cares about availability in those rare cases, it’s probably better to limit the number of partitions per broker to two to four thousand and the total number of partitions in the cluster to low tens of thousand.

More Partitions May Increase End-to-end Latency

The end-to-end latency in Kafka is defined by the time from when a message is published by the producer to when the message is read by the consumer.
Kafka only exposes a message to a consumer after it has been committed, i.e., when the message is replicated to all the in-sync replicas.
So, the time to commit a message can be a significant portion of the end-to-end latency.
By default, a Kafka broker only uses a single thread to replicate data from another broker, for all partitions that share replicas between the two brokers.
Our experiments show that replicating 1000 partitions from one broker to another can add about 20 ms latency, which implies that the end-to-end latency is at least 20 ms.
This can be too high for some real-time applications.

Note that this issue is alleviated on a larger cluster.
For example, suppose that there are 1000 partition leaders on a broker and there are 10 other brokers in the same Kafka cluster.
Each of the remaining 10 brokers only needs to fetch 100 partitions from the first broker on average.
Therefore, the added latency due to committing a message will be just a few ms, instead of tens of ms.

As a rule of thumb, if you care about latency, it’s probably a good idea to limit the number of partitions per broker to 100 x b x r, where b is the number of brokers in a Kafka cluster and r is the replication factor.

More Partitions May Require More Memory In the Client

In the most recent 0.8.2 release which we ship with the Confluent Platform 1.0, we have developed a more efficient Java producer.
One of the nice features of the new producer is that it allows users to set an upper bound on the amount of memory used for buffering incoming messages.
Internally, the producer buffers messages per partition.
After enough data has been accumulated or enough time has passed, the accumulated messages are removed from the buffer and sent to the broker.

If one increases the number of partitions, message will be accumulated in more partitions in the producer.
The aggregate amount of memory used may now exceed the configured memory limit.
When this happens, the producer has to either block or drop any new message, neither of which is ideal.
To prevent this from happening, one will need to reconfigure the producer with a larger memory size.

As a rule of thumb, to achieve good throughput, one should allocate at least a few tens of KB per partition being produced in the producer and adjust the total amount of memory if the number of partitions increases significantly.

A similar issue exists in the consumer as well.
The consumer fetches a batch of messages per partition.
The more partitions that a consumer consumes, the more memory it needs.
However, this is typically only an issue for consumers that are not real time.

Summary

In general, more partitions in a Kafka cluster leads to higher throughput.
However, one does have to be aware of the potential impact of having too many partitions in total or per broker on things like availability and latency.
In the future, we do plan to improve some of those limitations to make Kafka more scalable in terms of the number of partitions.

https://blog.csdn.net/kwengelie/article/details/51150114
https://www.confluent.io/blog/author/jun-rao/

kafka consumer group

Posted on 2018-11-06 | Edited on 2018-12-16 | In db

Consumer group

How to make multi-worker in same consumer-group all recieve same topic’s message?

In kafka the topic only send one worker in consumer-group per partition(randomly?).

It’s better set the partition number multiple of consumer group’s number

So if you need pulish/subscribe mode, make sure consumer in different group id.

How do I choose the number of partitions for a topic?

The partition count determines the maximum consumer parallelism and so
you should set a partition count based on the maximum consumer parallelism you would expect to need

Partition

A partition is basically a directory of log files.
Each partition must fit entirely on one machine.
Each partition is totally ordered.
Each partition is not consumed by more than one consumer thread/process in each consumer group.
Many partitions can be consumed by a single process(randomly?)
Another way to say the above is that the partition count is a bound on the maximum consumer parallelism.
parallelism mean different group can consume different partitions
More partitions will mean more files and more machine restrict.
Each partition corresponds to several znodes in zookeeper.
More partitions means longer leader fail-over time.Synchronize thousands of znodes is not a easy job.
More partitions the more expensive the position checkpoint is.
Expand more partitions need manually synchronize data from old partitions

https://www.oschina.net/question/2558468_2145935
https://blog.csdn.net/gezilan/article/details/80412490
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowmanytopicscanIhave?

Hello solr

Posted on 2018-10-14 | Edited on 2018-12-16 | In search

Install

install jdk1.8, source profile
install solr7.5, tar -zxvf package
try to start solr ./bin/solr start
if using root user try ./bin/solr start -force

then ip + port can used to visit solr’s administration user interface.

Add core

in administration user interface create a core will warning create error.

copy official config file into new core, create core in interface again.

1
2
3

cd /usr/solr-7.5.0/server/solr
mkdir demo
cp -r configsets/_default/conf/ demo

or use command create:

1	./bin/solr create -c demo

Data import

db configuration

Create file data.xml in solr-7.5.0/server/solr/demo/conf
column in field represent filed in DB
name in field represent unique filed in solr

<dataConfig>
 <dataSource type="JdbcDataSource"
             driver="com.mysql.jdbc.Driver"
             url="jdbc:mysql://127.0.0.1:3306/demo"
             user="demo"
             password="demo"/>
    <document>
        <entity name="demo" query="select * from demo" >
          <field column="Name" name="Name"/>
          <field column="Age" name="Age"/>
          <field column="Price" name="Price"/>
        </entity>
     </document>
</dataConfig>

modfiy solrconfig

modify solrconfig.xml add datasource option

<requestHandler name="/dataimport" class="solr.DataImportHandler">
      <lst name="defaults">
        <str name="config">data.xml</str>
      </lst>
</requestHandler>

modify managed-schema

modify managed-schema add data field in searching

1
2
3

<field name="Name" type="string" indexed="true" stored="true"/>
<field name="Age" type="pint" indexed="false" stored="true"/>
<field name="Price" type="pdouble" indexed="false" stored="true"/>

add jar

cd /usr/solr-7.5.0
cp -r dist/solr-dataimporthandler-* server/solr-webapp/webapp/WEB-INF/lib/
cd server/solr-webapp/webapp/WEB-INF/lib/
curl -O http://central.maven.org/maven2/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar

restart solr ./bin/solr restart -force
in admin interface import data

chinese word segmentation

https://search.maven.org/search?q=g:com.github.magese

1 2	curl -O https://search.maven.org/remotecontent?filepath=com/github/magese/ik-analyzer/7.5.0/ik-analyzer-7.5.0.jar cd /usr/solr-7.5.0/server/solr/demo/conf

modify managed-schema again

+++++++++++++
<fieldType name="text_ik" class="solr.TextField">
    <analyzer type="index">
        <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false" conf="ik.conf"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true" conf="ik.conf"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

<field name="Name" type="text_ik" indexed="true" stored="true" multiValued="true" />
+++++++++++++

-------------
<field name="Name" type="string" indexed="true" stored="true"/>
-------------

restart solr , then name field can be splited.

Delete data

delete in interface

In admin interface core’s document, use xml document type

1
2
3

<delete><query>id:1</query></delete>
<delete><query>*:*</query></delete>
<commit/>

delete in get

http://localhost:8080/solr/update/?stream.body=1&stream.contentType=text/xml;charset=utf-8&commit=true
http://localhost:8080/solr/update/?stream.body=Name:Turbo&stream.contentType=text/xml;charset=utf-8&commit=true

delete in post

1
2

curl  http://localhost:8080/update --data-binary  "<delete><query>title:abc</query></delete>"  -H 'Content- type :text/xml; charset=utf-8'
curl  http://localhost:8080/update --data-binary  "<commit/>"  -H 'Content- type:text/xml; charset=utf-8'

Operation in java

public class SolrUtils {

    private String serverUrl = "http://localhost:8983/solr/articles";

    public void Add(DemoDTO dto) throws IOException, SolrServerException {
        HttpSolrClient client = new HttpSolrClient(serverUrl);
        SolrInputDocument document = new SolrInputDocument();
        client.addBean(dto);
        client.commit();
    }

    public List<DemoDTO> search(String keywords, Integer page, Integer rows) throws SolrServerException, IOException {
        HttpSolrClient client = new HttpSolrClient(serverUrl);
        SolrQuery solrQuery = new SolrQuery();
        // key words
        solrQuery.set("q", "title:" + keywords);
        // setting page start from 0, rows is page's size
        solrQuery.setStart((Math.max(page, 1) - 1) * rows);
        solrQuery.setRows(rows);

        QueryResponse queryResponse = client.query(solrQuery);
        SolrDocumentList results = queryResponse.getResults();
        long numFound = results.getNumFound();

        List<DemoDTO> dataDTOs = new ArrayList<DemoDTO>();
        for (SolrDocument solrDocument : results) {
            DemoDTO dto = new DemoDTO();
            dto.setName(solrDocument.get("Name").toString());
            dto.setPrice(Double.valueOf(solrDocument.get("Price").toString()));
            dto.setAge(Integer.valueOf(solrDocument.get("Age").toString()));
            dataDTOs.add(dto);
        }
        // List<DemoDTO> dataDTOs=queryResponse.getBeans(DemoDTO.class);
        // System.out.println("sum:" + numFound);
        return dataDTOs;
    }

    public void del() throws SolrServerException, IOException {
        HttpSolrClient client = new HttpSolrClient(serverUrl);
        List<String> names = new ArrayList<String>();
        ids.add("david");
        ids.add("pam");
        ids.add("margot");
        client.deleteById(ids);
        client.commit();
    }
}

https://blog.csdn.net/bljbljbljbljblj/article/details/83023125
https://blog.csdn.net/yuanlaijike/article/details/79886025
https://blog.csdn.net/m0_37595732/article/details/72830122
https://blog.csdn.net/long530439142/article/details/79353845
http://lucene.apache.org/solr/guide/7_5/query-screen.html
http://lucene.apache.org/solr/guide/7_5/the-standard-query-parser.html#the-standard-query-parser