Integrity - website broken link checker PDF Print E-mail

 

Integrity

If you've maintained a website for any length of time, you'll know that links very quickly become broken.

We all move, delete or change pages, and when we do, it not only results in our own internal links breaking, but other people's links to our website becoming broken. Similarly, when other people alter their pages, our own external links become broken.

A broken link on your site is a dead end for your visitors and will also be bad news for your search engine optimisation (SEO).

Unless you enjoy clicking every single link on your site followed by the back button, then you'll need to use a website crawler like Integrity!

Feed it your home page address (url) and Integrity will follow all of your internal links to find your pages, checking the server response code for all internal and external links found.

Integrity is donationware, which means that it's available to personal users free of charge with no restrictions. I'm very grateful for donations and if you choose to donate, it will encourage further development of this and other OSX software.


Screenshots





 

System Requirements

Mac OSX 10.3 or higher. (Note that as from v1.4, 10.2 is no longer supported).

 


Mac OSX Download

Download Integrity

Download Integrity v3

Integrity v3 is here

A collection of new features makes it easier to step through your bad links and find them on all of the pages which they appear. See the version history below for full details.

 


PC Version?

If you're of the Windows persuasion, use Xenu's Link Sleuth. It's the best link checker that I've found, but the developer has made it clear that he's not interested in producing a specific Mac version. I've no connection with Tilman Hausherr (though he seems like a great guy), and this is no more than a personal recommendation to use the Link Sleuth if you're a pc user. Integrity isn't intended to be a Mac version of Xenu's Link Sleuth, but was inspired by it.


Version History

Version 3.1.1

released March 2009

Fixes bug which stopped further crawling if initial page is redirected.

Small efficiency/speed improvement.

Fixes bug which could register incorrect links if a request is redirected more than once.

Version3.1

released December 2008

Time stamp logged for each link checked
Views are now customisable - show or hide columns as you like. (Exported files reflect visible columns.)
"Redirected" no longer shows in status column as the information is available in its own column
New application icon with less transparency

Version3.02

released December 2008

Fixes bug related to unquoted href's
Unique page titles option (was new with v3.0 - crawls site faster and more accurately if you set this option and if your page titles *are* unique) now defaults to off for existing configs; defaulting to on was causing confusion.

Version3.01

released December 2008

Fixes bug preventing proper crawling of framesets
Fixes problem with pause/continue button
Fixes problem with About panel

Version3

released December 2008

Adds 'Inspect Bad Links' to View menu (opens the first bad link in the link inspector)
Adds 'Next Bad Link' button to link inspector (moves the link inspector to the next bad link if there is one)
Adds two new tools to the toolbar for 'Inspect bad links' and 'Inspect selected link' and a 'Customise Toolbar...' menu item
Adds highlighting feature - double-click an 'On page' from the list in the link inspector, Integrity will open selected page and highlight selected link with coloured background or coloured border.
Adds drop-down lists to preferences allowing you to choose the style of the highlighting (border / background, style and width of border)
Adds 'Archive pages while crawling' checkbox to preferences (archives pages while crawling - asks you for a save location when crawl is finished).

Version 2.2.2

released September 2008

If the link is around an image rather than text, the 'link text' columns will display [img]: and the alt text of the image.
'Redirected to' column added to flat view.
Changes to button bar including addition of export as html, csv and text (tdl) buttons. Now properly autosaves user customisation.
More information in the status display - now also shows how many bad links have been found

Version 2.2.1

released September 2008

Was generating the 'flat view' multiple times, giving the impression of 'hanging' after crawling large sites using lots of threads. Bug fixed, and progress bar added.

Version 2.2

released July 2008

Server response time is logged. This is the time taken between Integrity sending the request and receiving the first response. This may not reflect the actual server response time if Integrity is running a large number of threads, or if the internet connection is busy

When Integrity has finished running, a 'flat' view is available, that can be sorted by any of the columns

Global preferences and current config are now combined into one tabbed window

Standard customisable toolbar added and main window rearranged. Stop is now renamed 'Pause'

Version 2.1

released June 2008
Crawls local files (drag the file into the 'starting URL' box)

Version 2.0 (beta)

Architecture / Logic changed. This fixes thread-safety issues (ie v1.x crashing on faster machines when using larger number of threads). Architecture change also makes v2 faster.
Now handles sites built using frames.
Max number of threads increased. This was limited in version 1.6.6 as a quick-fix to thread-safety issues. Max number of threads (when slider is in 'more' position) is now 29, was 7.
'Threads' are no longer really separate threads owned by Integrity, but simultaneous asynchronous requests.

Version 1.6.11

released May 2008
Fixes bug which was causing some links to be skipped on certain pages. Integrity's parser was getting confused sometimes by javascript on pages containing 'less than' and 'greater than' operators.
Other small fixes and efficiencies.

Version 1.6.10

released April 2008
Progress indicators added to export functions.
Link info window now shows all occurrences of a link alongside the link text for each occurrence.

Version 1.6.9

released April 2008
Fixes bug related to trimming which randomly prevented complete crawling of whole site.
Revised handling of incorrectly nested quotes - now correctly allows for apostrophes as part of url ( "/pdf/Educators'_Guide" ).
Help menu now links to support pages of peacockmedia.co.uk, 'Donate' menu option added.

Version 1.6.8

released April 2008
Routines for trimming whitespace, querystring etc rewritten in pure C, improving efficiency.
Better handling of incorrectly nested single/double quotes ( href = "http://..' )
Now correctly handles base href's which don't give a scheme (assumes http://)
Better trimming of whitespace, ie carriage returns and other control characters in unexpected places in the middle of <a ..> tags
Shows how many times a link occurs, not just how many pages it appears on (ie it may appear multiple times on same page).

Version 1.6.7

released March 2008
Fixes bug which prevented links being found on a page if the end of a comment and an 'end script' tag were adjacent to each other ( --></script> )

Version 1.6.6

released March 2008
sends user-agent string in header - default is "integrity/1.6" but this can be changed (see Preferences) if your site needs integrity to appear to be a recognised browser.
Other fixes and efficiencies.

Version 1.6.5

released November 2007
'whitelists' and 'blacklists' from the config are no longer case-sensitive.
some problems with mcms zref fixed. zrefs are now shown when good links are hidden.
links which are not checked because they are in the blacklist, are treated as good links. They are hidden when good links are hidden and are given no colour label.
"Hide good links" button has now become "Show bad links only". This subtle change means that links which have not been checked will not show and improves running.
Small fixes and efficiencies.

Version 1.6.4

released October 2007
Fixes problem with tab-delimited file export
Both tab-delimited and comma-separated exports are 'flat', ie each 'on page url' has its own row
Fixes crashes or problems caused by carriage returns or whitespace present within a quoted href (yes, some html has really unexpected features)
Ignores Javascript (anything between <script> tags)
More object retention fixes and small efficiencies

Version 1.6.2

released September 2007
'on page url' will now recognise 'http://peacockmedia.co.uk' and 'http://peacockmedia.co.uk/' as the same link. Therefore a broken links may more correctly be reported on a lower number of pages and the whole application is a little more efficient.
Recognises and reports 'zref' links, a difficult-to-find link inserted by Microsoft Content Management Server
other small efficiencies and fixes.

Version 1.6.1

released July 2007
Some changes to improve stability

Version 1.6

released July 2007
Adds user-definable colour labels (see Preferences). A 'good link' is defined as server response code 2xx, redirected links include any 3xx code, a bad link is a 4xx code, and an 'error' is a 5xx server code or any other error.
Menu item added View > Info for Current Item (command-I), shows link inspector pallette (previously only available via double-click in the main table).
Fixes bug causing crash if no internet connection.

Version 1.5

released 28 May 2007
Supports base href.
Can now export tab-delimited text file along with CSV, plain text and HTML.
Improved HTML export - link urls are presented as links.
Adds 'Only follow links containing...' field.
Fixes bug allowing some 'commented out' urls to be tested.
Fixes bug preventing inspector window opening when some links double-clicked.
Preferences window added: allows choice of displaying 'on page' as url or page title.
Config Starting URL drop-down list behaviour improved .

Version 1.4.2

released May 21 2007
No longer parses and extracts links from error pages (eg 404 pages).
Now handles spaces in URLs (as long as correctly contained in single or double quotes).

Version 1.4

released April 22 2007
Fixes a problem in some earlier versions which prevented all links being found on some pages
HTML character entities in links are now 'un-encoded' (eg '&' is replaced with '&') before link is checked.
If link appears on more than one page, main table now shows actual number of pages rather than "multiple"
'Re-Check Bad Links' feature added (under File menu)
Fixes problem with export to CSV for some sites.
NB. early copies of 1.4 give the version number as 1.3.1 in about box.

Version 1.3.1

released April 7 2007
Fixes problem with the 'don't check URLs containing' feature which didn't work properly in v1.3
Fixes problem which caused some links to be missed
Small improvement to the stop button

Version 1.3

released April 6 2007
'This page only' checkbox added.
Status display more accurately shows number of links done.
Programme flow, thread safety and object retention improvements. Cures an instability which seemed to be related to websites which have large collections of external links and/or setting a larger number of threads.
Fixes bug preventing some link text from being recorded properly.
For some file types which may be larger files (pdf, mpg, mp3, jpg) the parser no longer sends an http request to check the 'Content-Type', speeding up the crawl time.

Version 1.2

released March 29 2007
Now tolerant to excessively long hrefs (previously hrefs over 1000 characters would break an internal limit and cause the application to crash).
Timeout can now be set in the config window. Using a very large number of threads can obviously make timeouts more likely and so the timeout figure can now be increased accordingly.
The link inspector window (double-click an entry in the main table) now shows the 'on page' list in a form which is clickable. A double-click will open the page in question.
The HTML report now shows the 'on page' column as links to the page in question.

Version 1.1

released March 26 2007
Link text shows up for more links - link text is still only held once regardless of how many instances of that link are found on the site, but if a link has no text (eg image link), then that will not overwrite the existing link text.
Ignores javascript links as well as mailto links.
Fixes bug triggered by a return within the tag.
Fixes bug which could prevent all links being found on certain pages.

Version 1.0

released March 25 2007
First non-beta release, free and not set to expire. Not generally released, but provided to 2 magazine coverdiscs.

Version 0.5 (Beta)

released March 22 2007
corrected problem which allowed cached data to be checked - new data is now requested every time.
Fixes bug which could prevent some links being found if javascript present in page.

Version 0.4 (Beta)

released March 21 2007
Bug fixed which prevented some relative URLs from being formed correctly
Displays better information about any redirected urls. The final status code shown is the status for the final (redirected to) URL
Link text included as column in main table
Change to programme flow and a number of small refinements and efficiency improvements meaning that the application remains responsive throughout larger crawls.
Bug fixed which prevented some configs saving properly

Version 0.3 (Beta)

released March 7 2007
Improved interface, added 'Continue' button, allows Integrity to be paused and re-started.
Exporting - results can be exported as HTML, CSV or plain text.

Version 0.2 (Beta)

released March 1 2007
Fixes bug preventing Integrity from following links where html is all uppercase.

Version 0.1 (Beta)

released Feb 2007

 
Joomla Templates by Joomlashack