Mergemill Pro easily and quickly converts character encoding and text file formats between TSV, CSV, and XML

Conversion Between Text File Formats

There are many ways to structure your data in a text file. Among them the CSV and tab-delimited formats are in widespread use and can be opened by many kinds of applications, like spreadsheet and database programs. You should avoid constructing and editing such files by hand. One problem with tab-delimited text files is that tabs are whitespace characters, and you may therefore easily break the structure by replacing a tab with a space. In the case of CSV, the comma is such a common character that the specification provides conventions for avoiding delimiter collision, so that a comma intended as part of the data is not interpreted as a delimiter instead. It is thus far better to convert file formats using software like Mergemill Pro.

XML, or extensible markup language, is the most commonly used machine-readable format. For compatibility between database applications, it is best to convert the tab-delimited and CSV formats to XML files. One important advantage, among many, of using XML is that you may declare the character encoding of the content. This makes it very easy to migrate multilingual data. In cases where unicode data in CSV or TSV formats cannot be recognized by Microsoft Excel, for example, converting them into XML with Mergemill Pro may solve the problem.

Conversion Between Text File Character Encodings

In order to represent textual characters in a file, some sort of mapping is used to assign numeric values to the characters. The mapping varies depending on the character set, which depends on factors like the language being used. Larger character sets use more bytes to represent each of their members. Interpretive problems may occur if a computer attempts to read data encoded with a mapping different from what it expects. So to handle text correctly, some method of identifying the various mappings and converting between them is necessary.

Most character sets and character encoding schemes developed in the past are limited in their coverage, usually supporting just one language or a small set of languages. Multilingual software has traditionally had to implement methods for supporting and identifying multiple character encodings. A simpler solution is to combine the characters for all commonly used languages and symbols into a single universal coded character set. Unicode is such a universal coded character set, and offers the simplest solution to the problem of text representation in multilingual systems. Because Unicode includes the character repertoires of most common character encodings, data can be encoded in a single coded character set.

The Mergemill Pro Advantage

Converting between common text file formats is easy with Mergemill Pro. You simply choose to export data in CSV, XML, or tab-delimited text format. You may also create a custom output format with no more than a few lines of scripts. Mergemill Pro lets you add a data processing job that uses the new "Convert data file format" output option, and specify a source file or folder and the output format and location. Mergemill Pro then quickly copies the item names, reads and writes the data values in proper formats, and converts the text encoding you specified.

Converting between text encoding is also easy. Mergemill Pro lets you specify the datafeed encoding and the output encoding, and it does the character encoding conversion in generating the output. The Mergemill Pro interface elements, internal data storage, and intermediate files created in running jobs are all in UTF-8 Unicode.

The biggest benefits of using Mergemill Pro are its automation features, and its powerful processing capabilities that let you do far more than simply conversion. You may set up a drop-in folder for Mergemill Pro to automatically process the files contained in the folder at certain scheduled times.

