Categories
CFP MAC text analysis text tools

Using textutil on the mac terminal

 

Apps-utilities-terminal-icon

The following was originally found on https://developer.apple.com

https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/textutil.1.html

 

NAME
textutil — text utility

SYNOPSIS
textutil [command_option] [other_options] file …

DESCRIPTION
textutil can be used to manipulate text files of various formats, using the mechanisms provided by the Cocoa text system.

The first argument indicates the operation to perform, one of:

-help Show the usage information for the command and exit. This is the default command option if none is specified.

-info Display information about the specified files.

-convert fmt Convert the specified files to the indicated format and write each one back to the file system.

-cat fmt Read the specified files, concatenate them, and write the result out as a single file in
the indicated format.

fmt is one of: txt, html, rtf, rtfd, doc, docx, wordml, odt, or webarchive

There are some additional options for general use:

-extension ext Specify an extension to be used for output files (by default, the extension will be
determined from the format).

-output path Specify the file name to be used for the first output file.

-stdin Specify that input should be read from stdin rather than from files.

-stdout Specify that the first output file should go to stdout.

-encoding IANA_name | NSStringEncoding
Specify the encoding to be used for plain text or HTML output files (by default, the output encoding will be UTF-8). NSStringEncoding refers to one of the numeric values recognized by NSString. IANA_name refers to an IANA character set name as understood by CFString. The operation will fail if the file cannot be converted to the specified encoding.

-inputencoding IANA_name | NSStringEncoding
Force all plain text input files to be interpreted using the specified encoding (by default, a file’s encoding will be determined from its BOM). The operation will fail if the file cannot be interpreted using the specified encoding.

-format fmt Force all input files to be interpreted using the indicated format (by default, a
file’s format will be determined from its contents).

-font font Specify the name of the font to be used for converting plain to rich text.

-fontsize size Specify the size in points of the font to be used for converting plain to rich text.

— Specify that all further arguments are file names.

There are some additional options for HTML and WebArchive files:

-noload Do not load subsidiary resources.

-nostore Do not write out subsidiary resources.

-baseurl url Specify a base URL to be used for relative URLs.

-timeout t Specify the time in seconds to wait for resources to load.

-textsizemultiplier x
Specify a numeric factor by which to multiply font sizes.

-excludedelements (tag1, tag2, …)
Specify which HTML elements should not be used in generated HTML (the list should be a
single argument, and so will usually need to be quoted in a shell context).

-prefixspaces n Specify the number of spaces by which to indent nested elements in generated HTML
(default is 2).

There are some additional options for treating metadata:

-strip Do not copy metadata from input files to output files.

-title val Specify the title metadata attribute for output files.

-author val Specify the author metadata attribute for output files.

-subject val Specify the subject metadata attribute for output files.

-keywords (val1, val2, …)
Specify the keywords metadata attribute for output files (the list should be a single
argument, and so will usually need to be quoted in a shell context).

-comment val Specify the comment metadata attribute for output files.

-editor val Specify the editor metadata attribute for output files.

-company val Specify the company metadata attribute for output files.

-creationtime yyyy-mm-ddThh:mm:ssZ
Specify the creation time metadata attribute for output files.

-modificationtime yyyy-mm-ddThh:mm:ssZ
Specify the modification time metadata attribute for output files.

EXAMPLES
textutil -info foo.rtf

displays information about foo.rtf.

textutil -convert html foo.rtf

converts foo.rtf into foo.html.

textutil -convert rtf -font Times -fontsize 10 foo.txt

converts foo.txt into foo.rtf, using Times 10 for the font.

textutil -cat html -title “Several Files” -output index.html *.rtf

loads all RTF files in the current directory, concatenates their contents, and writes the result out as
index.html with the HTML title set to “Several Files”.

DIAGNOSTICS
The textutil command exits 0 on success, and 1 on failure.

Categories
analysis of language MAC text tools text-analytics

Convert various text file formats in the OS X Terminal with textutil

Apps-utilities-terminal-icon

Original post here. Copyright MacIssues.

There are a number of ways you can convert a text document to another format, by simply opening it in a text editor like TextEdit and then choosing Save As from the File menu to export it. With TextEdit, you can choose Word, Rich Text, Plain Text, and OpenDocument Text, among others, as the formats in which to save your current file; however, if you are a Terminal user then you might enjoy knowing you can do this right from the command line.

One command-line tool Apple includes in OS X is “textutil” which can be used for a number of manipulations of supported text documents, with one of them being to convert a targeted document to a specified format:

Open the Terminal
Type the following command, replacing FORMAT with one of txt, html, rtf, rtfd, doc, docx, wordml, odt, or webarchive to specify the desired format:
textutil -convert FORMAT
Ensure there is a space after the format specification, and then drag your target document to the Terminal window, so the command looks something like this (in this case, converting a webarchive called “mypage” to docx):
textutil -convert docx ~/Desktop/mypage.webarchive

When this command is executed, the resulting file will appear in the same folder as the original.
While the use of the Terminal for this might seem unnecessary given the ability to use various word processing programs for converting files, you can use it when managing text documents in automator routines, shell scripts, or applescripts where conversion of these documents might be desired.

One prime use for this routine is to batch-convert files, so if you have a folder of txt documents you would like to convert to docx, then you can do so by using Terminal wildcards with this command to target all or at least a group of desired files in the folder:

textutil -convert docx ~/Desktop/TextDocuments/*.txt
In the above command, any .txt documents in the folder called “TextDocuments” on the current user’s desktop will be converted to docx format.

The textutil command can be used for these format conversion routines, but also supports a number of other features such as specifying encoding, changing font sizes and type faces, and modifying file metadata.

 

My Mac posts.

Categories
applied linguistics CFP CMC

CFP Multimodality in electronic feedback on writing

 

cfp

Call for papers for a special issue in Writing & Pedagogy (Equinox), (Winter 2017, 9:2) titled “Multimodality in electronic feedback on writing”.

It can be found under the following link:

https://journals.equinoxpub.com/index.php/WAP/announcement/view/130

Categories
analysis of language uk

Wordcloud of the House of Lords Science and Technology Select Committee: EU membership and UK science

 

eu-uk-wordle

HOUSE OF LORDS Science and Technology Select Committee
2nd Report of Session 2015–16
EU membership and UK science

Access the report

Create your own word cloud

Categories
CFP journals

#CFP Dialogue and Discourse journal

From the Corpora-list

Submissions are invited on all topics in the formal, computational, or psycholinguistic study of dialogue and discourse. Submissions received by May 1st will be considered for this issue, which is scheduled to appear in November 2016. Submissions received after this date will be considered for the next regular issue.

Dialogue and Discourse (D&D) is the first peer-reviewed open access journal dedicated exclusively to work that deals with language “beyond the sentence”. The journal adopts an interdisciplinary perspective, accepting work from Linguistics, Computer Science, Psychology, Sociology, Philosophy, and other associated fields with an interest in formally, technically, empirically or experimentally rigorous approaches. We are committed to ensuring the highest editorial standards and rigorous peer-review of all submissions, while granting open access to all interested readers. In addition to publishing a semi-annual regular issue, we publish special issues. Since 2010, we have published 41 papers in 3 special issues and 9 regular issues. The h-index for the journal, with most papers out less than 3 years, is 11.

Submissions are made via the online submission system at http://www.dialogue-and-discourse.org/submission.shtml. Authors are required to indicate if a submission is an extended version of one or more previously published conference paper(s); simultaneous submission to another venue is prohibited. Submissions will undergo rigorous peer-review according to the timeline below. Once accepted and finalised, papers will appear online immediately, as part of the next upcoming issue.

D&D (http://www.dialogue-and-discourse.org) is endorsed by SIGdial, SemDial, and AMLaP. D&D is indexed by the European Reference Index for the Humanities and Social Sciences.

* deadline for submissions May 01
* decision made Sep 01
* revisions due Oct 15
* issue published Nov 15

Dialogue and Discourse Editors

Issue Editor (Spring 2016):
Amanda Stent

Managing Editors:
Raquel Fernandez
Jonathan Ginzburg
David Schlangen

Associate Editors:
Gregory Aist
Matthew Crocker
Barbara Di Eugenio
Danielle Matthews
Rashmi Prasad
Massimo Poesio
Maite Taboada
David Traum

Full editorial board at: http://www.dialogue-and-discourse.org/editors.shtml