zsh on windows

  1. install windows terminal
  2. install git bash
  3. add git bash to windows terminal
{
    "guid": "{1c4de342-38b7-51cf-b940-2309a097f589}",
    "hidden": false,
    "name": "git bash",
    "commandline": "\"D:\\Program Files\\Git\\bin\\bash.exe\" -i -l",
    "historySize": 9001,
    "name": "Bash",
    "closeOnExit": true,
    "useAcrylic": true,
    "acrylicOpacity": 0.85, 
    "icon": "D:\\tools\\my-git-bash\\git-icon.png",
	"startingDirectory": null
}

  1. install zsh here, https://packages.msys2.org/package/zsh?repo=msys&variant=x86_64, extract the package and put into git bash directory `C:\Program Files\Git`

2. install oh my zsh

sh -c "$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

3. for bug with malformed character

cd ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-autosuggestions
git checkout tags/v0.6.4 -b v0.6.4-branch

source: https://github.com/zsh-users/zsh-autosuggestions/issues/614

source: https://miaotony.xyz/2020/12/13/Server_Terminal_gitbash_zsh/

gitlab CI CD with Ansible

there are many options for continuous deployment. one possible approach is to leverage on ansible and couple it with gitlab.

so in .gitlab-ci.yml

add a deploy stage

ansible will try to remove the old container and start the newly built docker image from previous steps.

git version at runtime and packaging

I was not able to generate the git.properties at the beginning. somehow turns out it was due to a missing version for the plugin.

            <plugin>
                <groupId>pl.project13.maven</groupId>
                <artifactId>git-commit-id-plugin</artifactId>
                <version>4.0.5</version> <!--need to add the specific version-->
                <executions>
                    <execution>
                        <goals>
                            <goal>revision</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

with that, at packaging, it could generate the package with the SHA.

<version>1.0-${git.commit.id.abbrev}-SNAPSHOT</version>

the package name like ABC-1.0-7X7X40a-SNAPSHOT.jar

at runtime, with spring actuator enabled, it could auto pick up the commit info:

class override (same full class name)

following up with https://lwpro2.dev/2021/12/28/maven-nested-modules/ and https://lwpro2.dev/2021/12/14/microservies-in-monorepo/, there are times where a class defined upstream (either third party or shared library internally) should be overriden.

The trick is to keep the class name exact with the same package name. when maven shaded the class, it would pick up the class from the closest level.

For example, if module ACore has a class names org.wordpress.util.DomainMapper, a class with the exact same name org.wordpress.util.DomainMapper could be created in A1 module to override the provided functionality.

when build module A with -amd, it would generated a A2 jar, which contains the compiled class binary from ACore. while the generated A1 jar, would contain the override class created in A1 module.

maven nested modules

following up with https://lwpro2.dev/2021/12/14/microservies-in-monorepo/, there are times it needs a set up of nested modules.

where for example, module A could contain module A1, module A2 for example.

the needed changes/set up for nested maven module is similar to the multi module set up. however, now module A needs to be with

<packaging>pom</packaging>

now.

similarly, the submodule should be included in module A.

<parent>
    <groupId>org.xxx</groupId>
    <artifactId>ParentModule</artifactId>
    <version>1.0-SNAPSHOT</version>
</parent>
<artifactId>ModuelA</artifactId>
<packaging>pom</packaging>

<modules>
    <module>ACore</module>
    <module>A1</module>
    <module>A2</module>
</modules>

among the submodules, if A1 needs to use the code from ACore, the dependency scope should be compile, so that it would build A1 shaded with the classes from ACore.

<artifactId>A1</artifactId>

<properties>
    <maven.compiler.source>8</maven.compiler.source>
    <maven.compiler.target>8</maven.compiler.target>
</properties>

<dependencies>
    <dependency>
        <groupId>org.xxx</groupId>
        <artifactId>ACore</artifactId>
        <version>1.0-SNAPSHOT</version>
        <scope>compile</scope>
    </dependency>
</dependencies>

this will work out within IDE. as for build or packaging, the command to build the nested jars are

mvn clean package -pl ModuleA-am -amd

-am will include upstream dependency for example module Core

where -amd will include the nested modules, for example ACore, A1 and A2.

microservies in monorepo

recently i have spent some time to break a monolith project into multiple modules within a single monorepo.

it was a huge codebase with several different projects commingled together into a same git repo. after the change, it now break into different modules, literally several microservies in a monorepo.

there are several benefits with this change. the key one is, instead of sharing all code together in each project, it will now build its own module separately with only needed code of concern, and only needed depencencies, (hence with a faster build time, smaller package, and separate of concerns). At the same time, the common/core code can still be shared/maintained across, vs the breaking updates/compatibility issue with normally package/jar sharing.

Here is how the project looks like before and after

before: monolith project

after: modules

One of the key change is with the maven multi modules.

where the parent pom specify the modules to include (for both compile and runtime packaging), with itself being a pom packaging.

Then for each microservice/module, refer back to its parent module, and include the shared/core module if needed.

Parent:

microservice/module:

then for IDE and CI/CD, build from parent module would have sub modules executed (packaged for example, if run mvn package)

in order to build a single module alone, this can be instead achieved by running `mvn $goal (package) -pl moduleA -am`. this could be triggered either locally, for example through a filewatcher for hot reload, or CI/CD for specific branches or MR.

gc on old gen

i have a large app which is currently running with 600GB max memory (Xmx). the app is processing at a controlled rate now (every half an hour) to avoid an OOM.

each run is consuming and processing >3.2 million kafka messages in less than 5 minutes (1 or 2 minutes normally).

even after a lot tuning, when i look at the heap size, i saw the memory footprint is keep bumping up. even though the eden space has a lot frequent GC (minor) during the <5 minutes interval, the old gen is keep bumping up gradually.

whole heap

Eden

old gen

this really seems like a memory leak.

however, after another thorough check into the code, it looks like the collection of objects which are not used did get dereferenced.

so if the code is right, then it looks like the gc might not be doing its job on the old gen.

so i have then triggered a manual GC, which resulted

after spending sometime to look into this in details, turns out the default ratio to trigger the gc on old gen is >40%. and this is definitely the ideal set up, because for the application, for example, it’s sit idling there 25 minutes out of every 30 minutes interval. while because the gc is waiting on an fixed size ratio to trigger, the memory was mostly wasted.

around 120GB was wasted in this case before the manual gc.

so turns out this was a proposal by JEP 346 in 2018 to tune this.

http://openjdk.java.net/jeps/346

and obviously for now before that JEP implemented, leverage on the periodicGC is a much needed practice instead of leaving to the gc algorithm alone.

-XX:G1PeriodicGCInterval=600000 -XX:G1PeriodicGCSystemLoadThreshold=LOAD 
     -XX:-G1PeriodicGCInvokesConcurrent

full gc with default setting where it’s triggered at >40% threshold:

jvm memory tuning

a big memory drainer is the string objects.

with the object header, pointer for the char array, there are minimum ~20 bytes (varies by java version) occupied even for a empty string.

this could become an especially a big problem, if a large volume (like millions of records) of messages to parse onto single jvm.

from java 8, there are two ways to handle this, (especially for situations where a large amount of data all have for example same headers, like “portfolio”, “name”, “currency”. these are likely to have limited/constant number of variances for both the key/attribute/property and values)

  1. string intern, this is the approach before java 8

caveat though, the java default implementation with string intern could be slow with native implementation.

an alternative to the native implementation is using map, which would server the same purpose and faster. like

public final class StringRepo extends ConcurrentHashMap<String, String> {
    public final static StringRepo repo = new StringRepo();
    public String intern(String s){
      //Note: handle npe
       return computeIfAbsent(s, String::intern);
    }
}

2. from java 8, string deduplication could be used to take on gc’s help on reduce the string memory footprint.

-XX:+UseG1GC -XX:+UseStringDeduplication -XX:+PrintStringDeduplicationStatistics

with GC

https://github.com/FasterXML/jackson-core/issues/726

protobuf NPE

What language does this apply to?
Java

If it’s a proto syntax change, is it for proto2 or proto3?
proto3

If it’s about generated code change, what programming language?
Java

Describe the problem you are trying to solve.
For the message below,

message Position {
    string portfolio =1;
}

The generated setter would be something like this

      public Builder setPortfolio(
          java.lang.String value) {
        if (value == null) {
         throw new NullPointerException();
        }
  
        portfolio_ = value;
        onChanged();
        return this;
      }

There is a throw NPE within the method.

I think this is really an opinioned approach, which instead should leave to developers to decide whether to handle it or throw NPE.
There could be a position message, for example, with many known optional field which could be null. Developers should be in a better position on how those fields should be set.

Describe the solution you’d like

The generated class should take the value to be set as it is. Something like

      public Builder setPortfolio(
          java.lang.String value) {
//        if (value == null) {
//         throw new NullPointerException();
//        }
  
        portfolio_ = value;
        onChanged();
        return this;
      }

Describe alternatives you’ve considered

Additional context
Add any other context or screenshots about the feature request here.

I guess the current “opinioned” approach probably could be due to a constraint from the protobuf format, where an int was used to determine the length-delimited value’s length. If no, I think by introducing a negative int (-1) could tell whether the following value is really empty (0) or null (-1).

https://github.com/protocolbuffers/protobuf/issues/9207

thread safe sorted map

besides using Collections to get a synchronized version of a `TreeMap`, another approach to get a output according to sorted key is to do the reverse using a helper class.

for example, using ObjectMapper

//set the ordering
mapper.configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true);

then to get the output or transform to another object, using the `mapper`

//output
mapper.writeValueAsString(pairs)

//transform
mapper.convertValue(pairs, Map.class)