Free AI-powered File Type Identification Tool by Google: Magika

This post covers a free AI-powered file type identification tool by Google. Magika is an AI-powered file-type identification system developed by Google. It can accurately detect binary and textual file types with over 99% accuracy. The tool uses a highly optimized deep-learning model designed and trained using Keras. It also has integration with VirusTotal to improve efficiency and accuracy. Magika is available as a standalone utility, a Python library, and an experimental npm package.

Magiska enables precise file identification within milliseconds by utilizing a model that only weighs about 1 MB. This allows Magiska to run fast even on a single CPU. It supports a vast range of content types and outperforms other traditional tools with 99%+ average precision and recall. The initial release is not targeting polyglot detection which means it’s not good with files containing multiple content types. Magika is released under the Google Open Source project which you can track here. It is open-sourced under the Apache2 License and can be found on GitHub.

AI-powered File Type Identification Tool: Magika

Magika is trained on a dataset of over 25M files across more than 100 content types. You can try the tool online using the web demo. This demo runs the tool locally in your web browser.

It starts with a demo file that you can remove and add your file instead. The is capable of processing multiple files at once. You can simply drag and drop your files directly to the tool. When you do that, it immediately starts processing them and listing the desired results.

The result lists the possible content types on the left with their probability on the right. I use a JavaScript file to test this tool. The screenshot attached above shows the result of that. As you can see in the screenshot, Magika detects the javascript as content type with a 100% probability. Along with that, it also detected other content types including shell, html, markdown, rst, etc. with their respective probability. This way, you can use Magika to quickly identify file content with accuracy.

Give it a try here.

Closing Words

Magika is a handy tool to quickly know the content type of a file without opening it. Being available as a Python command line, a Python API makes it easy to utilize Magika by developers. With a simple command, anyone can analyze and detect file content. The results are fast and accurate. Magika does have some limitations that might become obsolete over time with its open-source development and community contribution.

Free/Paid: Free

