Extract Text From PDF

Here is an Example showing the extraction of text from a pdf file, using ddx. The extracted text will be in the destination file.

<!--- The ddx file --->
<cfset ddxfile = Expandpath("doc_text.ddx")>
<!--- The source pdf file --->
<cfset sourcefile1 = Expandpath("pdf-file1.pdf")>
<!--- The destination file --->
<cfset destinationfile = Expandpath("ddx_result_doc_text.xml")>

<cfset inputStruct=StructNew()>
<cfset inputStruct.Doc1="#sourcefile1#">

<cfset outputStruct=StructNew()>
<cfset outputStruct.Out1="#destinationfile#">

<cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="ddxVar">

<cfoutput>The ddx operation was #ddxVar.Out1#</cfoutput><br>

<cfif #ddxVar.Out1# eq "successful">
   <cffile action="read" file="#destinationfile#" variable="filedata">
<cfdump var="#filedata#">
</cfif>

You can download the files using the download button.

Comments
# Posted By sdgsd | 9/27/08 9:16 PM
# Posted By xcbxc | 9/27/08 9:18 PM
# Posted By sdgsdg | 9/27/08 9:19 PM
# Posted By gfjgf | 9/27/08 9:20 PM
# Posted By dfhdf | 9/27/08 9:22 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.9.002. Contact Blog Owner