PHP Classes

wordDocumentHandler: Convert and clean MSWord documents to HTML

Recommend this page to a friend!
     
  Info   Example   View files Files   Install with Composer Install with Composer   Download Download   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
StarStarStar 43%Total: 6,990 All time: 280 This week: 488Down
Version License PHP version Categories
logantools 1.0.0GNU General Publi...4HTML, Text processing, Windows
Description 

Author

This class can be used to convert a Microsoft Word document to HTML, RTF or plain text using COM objects.

The input document formats can be Microsoft Word DOC, RTF and plain text.

The class can also clean the generated HTML to remove unnecessary markup that Microsoft Word adds.

Of course, you need MsWord installed on the server, and Windows OS.

It doesn't works ? Look below =>

1- your server must be running Win32
2- Microsoft Word must be installed on the server (I tested with Word2000)
3- readfile() is not available under PHP 4.3. You can use the following code to replace it with PHP<4.3
if (str_replace(".", "", phpversion())<"430")
{
function readFile( $f ) {
$out = ""; $lines = file ($f); foreach( $lines as $l ) $out .= $l."\n"; return $out;
}
}
4- try to not open a file on the netword (ie \\server\doc...) unless you fully understand the authentification process

Picture of Logan Dugenoux
Name: Logan Dugenoux <contact>
Classes: 6 packages by
Country: France France
Age: ???
All time rank: 785 in France France
Week rank: 72 Up5 in France France Equal

Example

<?php
//########################################################################################
// -------------- Summary
// Example of use of the wordDocumentHandler class
//
// -------------- Author
// Logan Dugenoux - 2003
// logan.dugenoux@netcourrier.com
// http://www.peous.com/logan/
//
// -------------- License
// GPL
//
//########################################################################################

   
@set_time_limit( 60 ); // cleaning is sometimes very long depending on options
   
require ("wordDocumentHandler.php");

   
   
// ############### Put here the name of a MsWord document ###################
   
$myWordFile = "my doc file.doc";
   
   
// The class
   
$w = new wordDocumentHandler();
   
   
$txt = $w->convertWordDocumentToString( $myWordFile , "htm" );
    if (!
$txt)
    {
        die(
$w->GetLastError() );
    }
    else
    {
        echo
"Conversion to string ok. Output len :".strlen($txt)."<br>";
    }
   
   
$w->cleanWordHTML( $txt );
    echo
"Cleaned string len :".strlen($txt)."<br>";
   

   
$outFile = $myWordFile.".html";
    if (!
$w->convertWordDocumentToFile( $myWordFile ,$outFile , "htm" ))
    {
        die(
$w->GetLastError() );
    }
    else
    {
        echo
"Conversion to file ok.<br>";
    }
   
?>


  Files folder image Files (2)  
File Role Description
Plain text file wordDocumentHandler.php Class Source of the wordDocumentHandler class
Accessible without login Plain text file wordDocumentHandler_test_code.php Example Example of use of wordDocumentHandler class

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.
Install with Composer Install with Composer
 Version Control Unique User Downloads Download Rankings  
 0%
Total:6,990
This week:0
All time:280
This week:488Down
 User Ratings  
 
 All time
Utility:60%StarStarStarStar
Consistency:61%StarStarStarStar
Documentation:-
Examples:70%StarStarStarStar
Tests:-
Videos:-
Overall:43%StarStarStar
Rank:3606