Search Results for author: Yi Gan

Efficient LLM inference solution on Intel GPU

A customized Scaled-Dot-Product-Attention kernel is designed to match our fusion policy based on the segment KV cache solution.

Paper
Add Code

Interestingly, we find that the transformer PLMs tend to score GPT-generated text 10-15\% higher on average, relative to human-authored documents.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.