i‎ > ‎


Like Mark Twain before it, reports of the mainframe's death have been exaggerated. While the data may be over a couple years old, as of 2014, IBM counted 97 of the world's top 100 banks continuing to rely on mainframes. With the philosophy of "if it ain't broke, don't fix it," there's little reason for large financial institutions to revisit 30 to 40 year-old transaction processing code if there's no business gain to be had from it.

Special Feature

Building the Software Defined Data Center

Building the Software Defined Data Center

There are massive efficiencies, agility, and manageability benefits from virtualizing your data center and running it from software.

  • Read More

Privately-held Syncsort is announcing today that it is acquiring U.K. mainframe database tools firm Cogito Ltd. for an undisclosed amount to add more optimization to its existing DMX-h Hadoop-mainframe ETL and data integration tooling. Aside from EBCDIC, the format for mainframe data, there have been few if any standards like ODBC offering standard targets for ETL tooling.

That doesn't mean that there is a lack of solutions -- Informatica and others have plenty of tools for converting legacy EBCDIC to ASCII format. Hostbridge and others provide converters snagging CICS data to the distributed format of choice. But the idiosyncrasies of dealing with mainframe data has made this more art than science. Each source -- from IBM DB2 on z/OS to hierarchical databases like IMSD, file systems like VSAM, and transaction processing environments like CICS -- have their own unique behaviors and hooks. And whatever you do, you want to get that data off the mainframe while expending as few MIPS as possible.

The idea of connecting the mainframe to new data targets is hardly new. "Downsizing" or "offloading" mainframe cycles to open systems platforms have active trends since the days of client/server and the run-up to Y2K. And with Hadoop, the prospect of mining that old mainframe data, or even offloading some applications to commodity, open source platforms, has infused new life into the notion.

Syncsort was born in the mainframe world, but its original claim to fame was around optimizing data sorts for optimal reporting and extracts to minimize MIPS. Over the last few years it has extended itself to the more generalized ETL/data integration world, capped by its big data integration with its DMX portfolio. And of course, its DMX-h tool is designed for feeding a variety of mainframe record formats to Hadoop using the visual development environment that you expect with ETL tools. But when it comes to sourcing mainframe data, you can never have too much information, as the formats have plenty of idiosyncrasies that, more often than not, are not (or not well) documented.

That's where Cogito comes in. They already offer a series of management and replication tools for IBM DB2 on z/OS and CA Technologies IDMS. But the market for those tools is pretty mature. Instead, Syncsort is buying the firm for its unique expertise for the internals of those platforms to make DMX-h more efficient. Over the next six to twelve months, the optimizations from Cogito will show up as enhanced DMX-h connection options.

Syncsort is obviously not the only player out there grabbing mainframe data to feed into Hadoop; IBM, Veristorm, Informatica, and others provide tools addressing connectivity, while IT consulting firms Metascale and TCS offer services targeted, not just at feeding data for analytics, but the stretch goals of offloading applications to Hadoop. So it's not news that there's a bulked up tool for mainframe-Hadoop integration.

With mainframes, the devil is in the details; traditionally, there has been lots of manual tweaking to get these sources to cooperate. So the key for Syncsort-Cogito is codifying the secret knowledge on how really to pry data from data sources where documentation is scarce and the people who knew where the skeletons are -- literally -- buried have long retired.


Subpages (2): a t